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DENIS CHETVERIKOV 

Abstract. Monotonicity is a key qualitative prediction of a wide array of economic models 
derived via robust comparative statics. It is therefore important to design effective and practical 
econometric methods for testing this prediction in empirical analysis. This paper develops a 
general nonparametric framework for testing monotonicity of a regression function. Using this 
framework, a broad class of new tests is introduced, which gives an empirical researcher a lot 
of flexibility to incorporate ex ante information she might have. The paper also develops new 
methods for simulating critical values, which are based on the combination of a bootstrap proce- 
dure and new selection algorithms. These methods yield tests that have correct asymptotic size 
and are asymptotically nonconservative. It is also shown how to obtain an adaptive rate optimal 
test that has the best attainable rate of uniform consistency against models whose regression 
function has Lipschitz-continuous first-order derivatives and that automatically adapts to the 
unknown smoothness of the regression function. Simulations show that the power of the new 
tests in many cases significantly exceeds that of some prior tests, e.g. that of Ghosal, Sen, and 
Van der Vaart (2000) . An application of the developed procedures to the dataset of Ellison and 
Ellison (2011) shows that there is some evidence of strategic entry deterrence in pharmaceutical 
industry where incumbents may use strategic investment to prevent generic entries when their 
patents expire. 



1. Introduction 

The concept of monotonicity often appears in economics research. For example, monotone 
comparative statics has been a popular research topic in economic theory for many years. See, 
in particular, the seminal work on this topic by Milgrom and Shannon (1994) and Athey (2002). 
Given the great deal of effort put into deriving conditions that are necessary and sufficient for 
monotonicity in theoretical models, the natural question is whether we observe monotonicity in 
the data. This paper provides a general nonparametric framework for testing monotonicity of 
a regression function. Tests of monotonicity developed in this paper can be used to evaluate 
assumptions and implications of economic theory concerning monotonicity. In addition, as was 
recently noticed by Ellison and Ellison (2011), these tests can also be used to provide evidence of 
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existence of certain phenomena related to strategic behavior of economic agents that are difficult 
to detect otherwise. Several motivating examples are presented in the next section. 

I start with the model 



where 1^ is a scalar random variable, {Xi} C M is a sequence of nonstochastic design points, 
/ is an unknown function, and {si} is a sequence of independent zero- mean unobserved scalar 
random variables. Later on in the paper, I extend the analysis to cover models with multivariate 
Xj's. I am interested in testing the null hypothesis, Tio, that f{x) is nondecreasing against the 
alternative, Tia, that there are xi and X2 such that xi < X2 but f{xi) > /(X2). The decision 
is to be made based on the sample of size n, {Xj, l^}i<gj<g„. I assume that / is smooth but do 
not impose any parametric structure on it. I derive a theory that yields tests with the correct 
asymptotic size. I also show how to obtain consistent tests and how to obtain a test with 
the optimal rate of uniform consistency against classes of functions with Lipschitz first order 
derivatives. Moreover, the rate optimal test constructed in this paper is adaptive in the sense 
that it automatically adapts to the unknown smoothness of /. 

This paper makes several contributions. First, I introduce a general framework for testing 
monotonicity. This framework allows me to develop a broad class of new tests, which also includes 
some existing tests as special cases. This gives a researcher a lot of flexibility to incorporate ex 
ante information she might have. Second, I develop new methods to simulate the critical values 
for these tests that in many cases yield higher power than that of existing methods. Third, I 
consider the problem of testing for monotonicity in models with multiple covariates for the first 
time in the literature. As will be explained in the paper, these models are more difficult to 
analyze and require rather different treatment in comparison with the case of univariate Xj's. 

Constructing a critical value is an important and difficult problem in nonparametric testing. 
The problem arises because most test statistics studied in the literature have some asymptotic 
distribution when / is constant but diverge if / is strictly increasing. This discontinuity implies 
that for some sequences of models f = fn, the limit distribution depends on the local slope 
function, which is an unknown infinite-dimensional nuisance parameter that can not be estimated 
consistently from the data. A common approach in the literature to solve this problem is to 
calibrate the critical value using the case when the type I error is maximized (the least favorable 
model), i.e. the model with constant /0 In contrast, I develop two selection procedures that 
estimate the set where / is not strictly increasing, and then adjust the critical value to account 
for this set. The estimation is conducted so that no violation of the asymptotic size occurs. The 
critical values obtained using these selection procedures yield valuable power improvements in 

-'^The exception is Wang and Meyer (2011) who use the model with an isotone estimate of / to simulate the 
critical value. They do not prove whether their test maintains the required size, however. 
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comparison with other tests if / is strictly increasing over some subsets of its domain. The first 
selection procedure, which is based on the one-step approach, is related to those developed in 
Chernozhukov, Lee, and Rosen (2009), Andrews and Shi (2010), and Chetverikov (2012), all of 
which deal with the problem of testing conditional moment inequalities. The second selection 
procedure is based on the stepdown approach. It is related to methods developed in Romano 
and Wolf (2005b) and Romano and Shaikh (2010). The details, however, are rather different. 

Another important issue in nonparametric testing is how to choose a smoothing parameter. In 
theory, the optimal smoothing parameter can be derived for many smoothness classes of functions 
/. In practice, however, the smoothness class that / belongs to is usually unknown. I deal with 
this problem by employing the adaptive testing approach. This allows me to obtain tests with 
good power properties when the information about smoothness of the function / possessed by the 
researcher is absent or limited. More precisely, I construct a test statistic using many different 
weighting functions that correspond to many different values of the smoothing parameter so that 
the distribution of the test statistic is mainly determined by the optimal weighting function. 
I provide a basic set of weighting functions that yields a rate optimal test and show how the 
researcher can change this set in order to incorporate ex ante information. 

The literature on testing monotonicity of a nonparametric regression function is quite large. 
The tests of Gijbels, Hall, Jones, and Koch (2000) and Ghosal, Sen, and van der Vaart (2000) 
(from now on, GHJK and GSV, respectively) are based on the signs of (li+fc — Yi)(Xi^k — Xi). 
Hall and Heckman (2000) (from now on, HH) developed a test based on the slopes of local linear 
estimates of /. The list of other papers includes Schlee (1982), Bowman, Jones, and Gijbels 
(1998), Dumbgen and Spokoiny (2001), Durot (2003), Beraud, Huet, and Laurent (2005), and 
Wang and Meyer (2011). Lee, Linton, and Whang (2009) and Delgado and Escanciano (2010) 
derived tests of stochastic monotonicity, which means that the conditional cdf of Y given X, 
Fy|x(y,a^), is (weakly) decreasing in x for any fixed y. 

As an empirical application of the results developed in this paper, I consider the problem of 
detecting strategic entry deterrence in the pharmaceutical industry. In that industry, incumbents 
whose drug patents are about to expire can change their investment behavior in order to prevent 
generic entries after the expiration of the patent. Although there are many theoretically com- 
pelling arguments as to how and why incumbents should change their investment behavior (see, 
for example, Tirole (1988)), the empirical evidence is rather limited. Ellison and Ellison (2011) 
showed that, under certain conditions, the dependence of investment on market size should be 
monotone if no strategic entry deterrence is present. In addition, they noted that the entry 
deterrence motive should be important in intermediate-sized markets and less important in small 
and large markets. Therefore, strategic entry deterrence might result in the nonmonotonicity of 
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the relation between market size and investment. Hence, rejecting the null hypothesis of mono- 
tonicity provides the evidence in favor of the existence of strategic entry deterrence. I apply 
the tests developed in this paper to Ellison and Ellison's dataset and show that there is some 
evidence of nonmonotonicity in the data. The evidence is rather weak, though. 

The rest of the paper is organized as follows. Section [2] provides motivating examples. Section [3] 
describes the general test statistic and gives several methods to simulate the critical value. Section 
H] contains the main results under high-level conditions. Section [5] is devoted to the verification 
of high-level conditions under primitive assumptions. Since in most practically relevant cases, 
the model also contains some additional covariates. Section [U] studies the cases of partially linear 
and fully nonparametric models with multiple covariates. Section [7] presents a small Monte Carlo 
simulation study. Section [8] describes the empirical application. Section [9] concludes. All proofs 
are contained in the Appendix. 

Notation. Throughout this paper, let {ej} denote a sequence of independent A^(0, 1) random 
variables that are independent of the data. The sequence {ej} will be used in bootstraping 
critical values. The notation i = l,n is shorthand for i S {1, ...,n}. For any set 5, I denote the 
number of elements in this set by The notation a„ < bn means that there exists a constant 
C independent of n such that ^ C6„. I use symbol C to denote a generic constant the value 
of which may vary from line to line, and I use symbol Cj for an integer j to denote a constant 
the value of which is fixed throughout the paper. 

2. Motivating Examples 

There are many interesting examples where testing for monotonicity can be fruitfully used in 
economics. Several examples are provided in this section. 

1. Testing implications of economic theory. Many testable implications of economic 
theory are concerned with comparative statics analysis. These implications most often take 
the form of qualitative statements like "Increasing factor X will positively (negatively) affect 
response variable y. The common approach to test such results on the data is to look at 
the corresponding coefficient in the linear (or other parametric) regression. It is said that the 
theory is confirmed if the coefficient is significant and has the expected sign. More precisely, one 
should say that the theory is "confirmed on average" because the linear regression gives average 
coefficients. This approach can be complemented by testing monotonicity. If the hypothesis of 
monotonicity is rejected, it means that the theory is lacking some empirically important features. 

For example, a classical paper Holmstrom and Milgrom (1994) on the theory of the firm is built 
around the observation that in multitask problems different incentive instruments are expected 
to be complementary to each other. Indeed, increasing an incentive for one task may lead the 
agent to spend too much time on that task ignoring other responsibilities. This can be avoided 
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if incentives on different tasks are balanced with each other. To derive testable implications of 
the theory, Holmstrom and Milgrom study a model of industrial selling introduced in Anderson 
and Schmittlein (1984) where a firm chooses between an in-house agent and an independent 
representative who divide their time into four tasks: (i) direct sales, (ii) investing in future sales 
to customers, (iii) nonsale activities, such as helping other agents, and (iv) selling the products 
of other manufacturers. Proposition 4 in their paper states that under certain conditions, the 
conditional probability of having an in-house agent is a (weakly) increasing function of the mar- 
ginal cost of evaluating performance and is a (weakly) increasing function of the importance of 
nonselling activities. These are hypotheses that can be directly tested on the data by procedures 
developed in this paper. This would be an important extension of linear regression analysis 
performed, for example, in Anderson and Schmittlein (1984) and Poppo and Zenger (1998). 

2. Testing assumptions of economic theory. Monotonicity is also a key assumption 
in many economic models, especially in those concerning equilibrium analysis. For example, in 
the theory of global games it is often assumed that the profit function of an individual given 
that she chooses a particular action is nondecreasing in the proportion of her opponents who 
also choose this action, or/and that this function is nondecreasing in an exogenous parameter. 
See, for example, Morris and Shin (1998), Morris and Shin (2001), and Angeletos and Werning 
(2006). 

3. Detecting strategic effects. Certain strategic effects, the existence of which is difficult 
to prove otherwise, can be detected by testing for monotonicity. An example on strategic entry 
deterrence in the pharmaceutical industry is described in the Introduction and is analyzed in 
Section [51 Below I provide another example concerned with the problem of debt pricing. Consider 
a model where investors hold a collateralized debt. The debt will yield a fixed payment in 
the future if it is rolled over and an underlying project is successful. Otherwise the debt will 
yield nothing. Alternatively, all investors have an option of not rolling over and getting the 
value of the collateral immediately. The probability that the project turns out to be successful 
depends on the fundamentals and on how many investors roll over. Each investor possesses 
some information on the fundamentals. If this information is common knowledge, the price of 
the debt is clearly an increasing function of the value of the collateral. Morris and Shin (2004) 
show, however, that in the absence of common knowledge, high value of the collateral leads 
investors to believe that many other investors will not roll over, and the project will not be 
successful. This strategic effect implies that the price of the debt might decrease as the collateral 
becomes more valuable, thus causing nonmonotonicity. They argue that this effect is important 
for understanding anomalies in empirical implementation of the standard debt pricing theory 
of Merton (1974). A natural question is how to prove existence of this effect in the data. One 
possible strategy is to test whether conditional mean of the price of the debt given the value of the 
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collateral is a monotonically increasing function. Rejecting the null hypothesis of monotonicity 
provides evidence in favor of the existence of the strategic effect. 

4. Testing assumptions of econometric models. Monotonicity is often assumed in the 
econometrics literature on estimating treatment effects. A widely used econometric model in 
this literature is as follows. Suppose that we observe a sample of individuals, i = l,n. Each 
individual has a random response function yi{t) that gives her response for each level of treatment 
t £ T. Let Zi and m = yi{zi) denote the realized level of the treatment and the realized response 
correspondingly (both of them are observable). The problem is how to derive inference on 
E[yi{t)]. Manski and Pepper (2000) introduced assumptions of monotone treatment response, 
which imposes that yi{t2) ^ ydti) whenever t2 ^ ti, and monotone treatment selection, which 
imposes that E[yi[t)\zi = v\ is increasing in v for all t €z T. The combination of these assumptions 
yields a testable prediction. Indeed, for all V2 ^ wi, 

E[yi\zi = V2] = E[yi{v2)\zi = V2] 
^ E[yi{vi)\zi = V2] 
^ E[yi{vi)\zi = vi] 
= E[yi\zi=vi]. 

Since all variables on both the left and right hand sides of this chain of inequalities are observable, 
this prediction can be tested by the procedures developed in this paper. 

5. Classification problems. Some concepts in economics are defined using monotonicity. 
For example, a good is called normal (inferior) if demand for this good is an increasing (decreas- 
ing) function of income. A good is called luxury (necessity) if the share of income spent on this 
good is an increasing (decreasing) function of income. Monotonicity testing can be fruitfully 
used to classify different goods using this standard terminology. A related problem arises in the 
Ramsey-Cass-Koopman growth model where one of the most important questions is whether 
current savings is a nondecreasing function of current level of capital. See, for example, Milgrom 
and Shannon (1994). 

3. The Test 

3.1. The General Test Statistic. Recall that I consider a model given in equation ([T|), and 
the test should be based on the sample {Xi,Yi}^^^ of n observations where Xi and Yi are a 
nonstochastic design point and a scalar dependent random variable, respectively. In this Section 
and in Sections [H and O I assume that Xi G M. The case where Xi S M"^ for d > 1 is considered 
in Section [6l 



TESTING REGRESSION MONOTONICITY 



7 



Let : K X R ^ M be some weighting function satisfying Q{xi,X2) = Q{x2,xi) and 

Q{xi,X2) ^ for all xi,X2 G M, and let 

b = b{{Xi,Yi}) = (1/2) Yl - ^i)sign(^i - Xi)Q{Xi,Xj) 

be a test function. Since Q{Xi,Xj) ^ and E[y^] = f{Xj), it is easy to see that under Ho, that 
is, when the function / is non-decreasing, E[6] ^ 0. On the other hand, if T-Lq is violated, there 
exists a function •) such that E[6] > 0. Therefore, b can be used to form a test statistic 
if I can find an appropriate function Q{j,-). For this purpose, I will use the adaptive testing 
approach developed in statistics literature. Even though this approach has attractive features, it 
is almost never used in econometrics. An exception is Horowitz and Spokoiny (2001), who used 
it for specification testing. 

The idea behind the adaptive testing approach is to choose Q{-, ■) from a large set of potentially 
useful weighting functions that maximizes the studentized version of b. Formally, let 5„ be some 
general set that depends on n, and for s G iS„, let Q(-,-,s) : M x M — t- M be some function 
satisfying Q{x\,X2-,s) = Q{x2,xi,s) and Q{xi,X2,s) ^ for all xi,X2 G M. In addition, let 

b{s) = b{{Xi,Yi}, s) = (1/2) J2 - ^Ms^i^j - Xi)Q{Xi, Xj, s) 
be a test function. Since Xi are nonstochastic, the variance of b{s) is given by 

V{s) = Vi{Xi},{ai},s)= 5Z E sign{Xj-Xi)Q{Xi,Xj,s) 

where = (E[e?])"'^/^. In general, dj are unknown, and should be estimated from the data. Let 
ai denote some (not necessarily consistent) estimator of cjj. Available estimators are discussed 
later in this Section. Then the estimated variance of b(s) is 

V{s) = V{{Xi},{ai},s)= Yl^iiYl sign(X, -XOQ(Xi,X,-,s) 

The general form of the test statistic that I consider in this paper is 

b{{Xi,Yi},s) 



T = T{{Xi, Yi], {di}, Sn) = max 

^Vi{X,},{a.},s) 

Large values of T indicate that the null hypothesis is violated. Later on in this section, I will 
provide methods for estimating quantiles of T under "Ho and for choosing a critical value for the 
test based on the statistic T. 

The set 5„ determines adaptivity properties of the test, that is the ability of the test to detect 
many different types of deviations from T-Lq. Indeed, each weighting function Q{-, ■,,'}) is useful 
for detecting a particular type of deviations, and so the larger the set of weighting functions 
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Sn is, the more types of deviations can be detected, and the higher is adaptivity of the test. 
In this paper, I aUow for exponentiahy large (in the sample size n) sets 5„. This implies that 
the researcher can choose a huge set of weighting functions, which allows her to detect large 
set of different deviations from T-Lq. The downside of the adaptivity, however, is that expanding 
the set Sn increases the critical value, and thus decreases the power of the test against those 
alternatives that can be detected by weighting functions already included in 5„. Fortunately, in 
many cases the loss of power is relatively small; see, in particular, discussion after Theorem [2] on 
the dependence of critical values on the size of the set 5„. 

3.2. Typical Weighting Functions. Let me now describe typical weighting functions. Con- 
sider some positive compactly supported kernel function : M — )• m|§ For convenience, I will 
assume that the support of K is [—1, 1]. In addition, let s = {x, h) where x is a location point 
and /i is a bandwidth value. Finally, define 

Q{xi,X2, (x, h)) = \xi - X2\'^K{{xi - x)/h)K{{x2 - x)/h) (2) 

for some k ^ 0. 1 refer to this Q as a kernel weighting function. 

Assume that a test is based on kernel weighting functions and 5„ consists of pairs s = (x, h) 
with many different values of x and h. To explain why this test has good adaptivity properties, 
consider figure 1 that plots two regression functions. Both /i and /2 violate but locations 
where T-Lq is violated are different. In particular, /i violates T-Lq on the interval [xi,X2] while the 
corresponding interval for /2 is [x3,X4]. In addition, /i is relatively less smooth than /2, and 
[xi,X2] is shorter than [x3,X4]. To have good power against /i, Sn should contain a pair (x,/i) 
such that [x — h,x + h] C [xi,X2]. Indeed, if [x — h,x + h] is not contained in [xi,X2], then positive 
and negative values of the summand of b will cancel out yielding a low value of b. In particular, it 
should be the case that x G [xi,X2]. Similarly, to have good power against /2, Sn should contain 
a pair (x, h) such that x G [x3, X4]. Therefore, using many different values of x yields a test that 
adapts to the location of the deviation from T-Lq. This is spatial adaptivity. Further, note that 
larger values of h yield smaller variance of b. So, given that [x3,X4] is longer than [xi,X2], the 
optimal pair (x, h) to test against /2 has larger value of h than that to test against /i. Therefore, 
using many different values of h results in adaptivity with respect to smoothness of the function, 
which, in turn, determines how fast its first derivative is varying and how long the interval of 
nonmonotonicity is. 

The general framework considered here gives the researcher a lot of flexibility in determining 
what weighting functions to use. In particular, if the researcher expects that any deviations from 
T-Lq, if present, are concentrated around some particular point Xj, then she can restrict the set Sn 
and consider only pairs with x = Xi. Note that this will increase the power of the test because 

^The kernel function is called positive if it is positive on its support. 
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Figure 1. Regression Functions Illustrating Different Deviations from T-Lq 
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smaller sets Sn yield lower critical values. In addition, if it is expected that the function / is 
rather smooth, then the researcher can restrict the set Sn by considering only pairs {x, h) with 
large values of h since in this case deviations from if present, are more likely to happen on 
long intervals. 

Another interesting choice of the weighting functions is 

Q(xi,X2,s)= ^ \xi-X2\^K{{xi-x'')/h)K{{x2-x')/h) 

where s = {x^, x™, h). These weighting functions are useful if the researcher expects multiple 
deviations from T-Lq. 

If no ex ante information is available, I recommend using kernel weighting functions with 
Sn = {ix,h) : X G {Xi, ...,Xn},h G Hn} where Hn = {h = hra&W ■ h ^ hminj = 0,1,2,...} and 
hma,x = niaxi^jj^n \Xi — Xj\/2. I refer to this Sn as a basic set of weighting functions. I also 
recommend setting u = 0.5, /imin = ^max(0.3/n'^'^^)^/^, and A; = or 1. This choice of parameters 
is consistent with the theory presented in sections H] and [5] and has worked well in simulations. 
The value of /imin is selected so that the test function b{s) for any given s uses no less than 
approximately 15 observations when n = 100. 

3.3. Comparison with Other Known Tests. I will now show that the general framework 
described above includes the HH test statistic and a slightly modified version of the GSV test 
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statistic as special cases that correspond to different values of k in the definition of kernel weight- 
ing functions. 

GSV use the following test function: 

b{s) = (1/2) ^'Sn{Y^ - Yj)sign{Xj - Xi)K{{Xi - x)/h)K{{X, - x)/h), 

whereas setting A; = in equation ([2]) yields 

bis) = (1/2) - Yj)sign{X, - Xi)K{{Xi - x)/h)K{{Xj - x)/h), 

and so the only difference is that I include the term (Yi — Yj) whereas they use sign(li — Yj). It 
will be shown in the next section that my test is consistent. On the other hand, I claim that GSV 
test is not consistent under the presence of conditional heteroscedasticity. Indeed, assume that 
f{Xi) = —Xi, and that is —2Xi or 2Xi with equal probabilities. Then (Yi — Yj){Xj — Xi) > 
if and only if (e^ — £j){Xj — Xi) > 0, and so the probability of rejecting T-Iq for the GSV test is 
numerically equal to that in the model with f{Xi) = for i = l,n. But the latter probability 
does not exceed the size of the test. This implies that the GSV test is not consistent since 
it maintains the required size asymptotically. Moreover, they consider a unique nonstochastic 
value of h, which means that the GSV test is nonadaptive with respect to the smoothness of the 
function /. 

Let me now consider the HH test. The idea of this test is to make use of local linear estimates 
of the slope of the function /. Using well-known formulas for the OLS regression, it is easy to 
show that the slope estimate of the function / given the data {Xi,Yi)^^^_^ with si < S2 where 
{Xj}"^^ is an increasing sequence is given by 

(^2 - .1) E.,<.,s. - iJ:s.<Ks2 ' ^ ^ 

where s = (si,S2). Note that the denominator of ([3]) is nonstochastic, and so it disappears after 
studentization. In addition, simple rearrangements show that the numerator in ([3]) is 

(1/2) Y {Yi-Yj){Xj-Xi)l{x-hi^Xii^x + h}l{x-h^Xjf^x + h} (4) 

for some x and h. On the other hand, setting A: = 1 in equation ([2|) yields 

5(s) = (l/2) Y. {Yi-Yj){Xj-Xi)K{{Xi-x)/h)K{{Xj-x)/h). (5) 

Noting that expression in (jH) is proportional to that on the right hand side in ([SJ with K{-) = 
1{[— 1, -|-1]}(-) implies that the HH test statistic is a special case of those studied in this paper. 
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3.4. Estimating CTj. In practice, cxj is usually unknown, and, hence, should be estimated from 
the data. Let ai denote some estimator of fij. I provide results for two types of estimators. The 
first type of estimators is easier to implement but the second worked better in simulations. 

First, fjj can be estimated by the residual Sj. More precisely, let / be some uniformly consistent 
estimator of / with at least a polynomial rate of consistency in probability, i.e. f{Xi) — f{Xi) = 
Op{n~''^^) uniformly over i = l,n for some Ki > 0, and let CTj = £i where £i = Yi — f{Xi). Note that 
ai can be negative. Clearly, ai is not a consistent estimator of ai. Nevertheless, as I will show in 
Section [H this estimator leads to valid inference. Intuitively, it works because the test statistic 
contains the weighted average sum of cr^^, i = l,n, and the estimation error averages out. To 
obtain a uniformly consistent estimator / of /, one can use a series method (see Newey (1997), 
theorem 1) or local polynomial regression (see Tsybakov (2009), theorem 1.8). If one prefers 
kernel methods, it is important to use generalized kernels in order to deal with boundary effects 
when higher order kernels are used; see, for example, Muller (1991). Alternatively, one can choose 
Sn so that boundary points are excluded from the test statistic. In addition, if the researcher 
decides to impose some parametric structure on the set of potentially possible functions, then 
parametric methods like OLS will typically give uniform consistency with ki arbitrarily close to 



The second way of estimating ai is to use a parametric or nonparametric estimator ai satisfying 
o'i — ai = Op{n~'^^) uniformly over i = l,n for some ki > 0. Many estimators of a^ satisfy this 
condition. Assume that the data {Xi,Yi}f^^ are arranged so that Xi ^ Xj whenever i ^ j. Then 
the estimator of Rice (1984), given by 



is "v/n-consistent if ai = a for alH = l,n and / is piecewise Lipschitz-continuous. 

The Rice estimator can be easily modified to allow for conditional heteroscedasticity. Choose 
a bandwidth value bn > 0. For i = l,n, let J{i) = {j = l,n : \Xj — Xi\ ^ 6„}. Let \ J{i)\ denote 
the number of elements in J{i). Then ai can be estimated by 



I refer to ([7]) as a local version of Rice's estimator. An advantage of this estimator is that it 
is adaptive with respect to the smoothness of the function /. Lemma [2] in Section [5] provides 
conditions that are sufficient for uniform consistency of this estimator with at least a polynomial 



j = l,n — 1. The intuition for consistency is as follows. Note that Xj^i is close to Xj. So, if the 



1/2. 




(6) 





rate. The key condition there is that — (^jl ^ C\Xj^i — Xj\ for some C > and all 
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function / is continuous, then 
so that 

since ej+i is independent of ej. Further, if 6„ is sufficiently small, then ctj^i + j ~ 2(7^^ since 
— Xi\ ^ bn and \Xj — Xi\ ^ and so a? is close to Cj. Other available estimators are 
presented, for example, in Muller and Stadtmuller (1987), Fan and Yao (1998), Horowitz and 
Spokoiny (2001), and Hardle and Tsybakov (1997). 

3.5. Simulating the Critical Value. In this subsection, I provide three different methods for 
estimating quantiles of the null distribution of the test statistic T. These are plug-in, one-step, 
and stepdown methods. All of these methods are based on the procedure known as the Wild 
bootstrap. The Wild bootstrap was introduced in Wu (1986) and used by Liu (1988), Mammen 
(1993), Hardle and Mammen (1993), Horowitz and Spokoiny (2001), and Chetverikov (2012). See 
also Chernozhukov, Chetverikov, and Kato (2012). The three methods are arranged in terms of 
increasing power and computational complexity. The validity of all three methods is established 
in theorem [TJ Recall that {ej} denotes a sequence of independent A^(0, 1) random variables that 
are independent of the data. 

Plug-in Approach. Suppose that we want to obtain a test of size a. The plug-in approach is 
based on two observations. First, under TIq, 

b{s) = (1/2) J2 {f{X,)- f{Xj) + ei-ej)sign{Xj-Xi)Q{Xi,Xj,s) (8) 

^ (1/2) {ei-ej)sign{Xj-XMXi,Xj,s) (9) 

since Q{Xi,Xj) ^ and f{Xi) ^ fi^j) whenever Xi ^ Xj under TIq, and so the (1 — a) 
quantile of T is bounded from above by the (1 — a) quantile of T in the model with f{x) = 
for all X G M, which is the least favorable model under T-Lq. Second, it will be shown that the 
distribution of T asymptotically depends on the distribution of noise {si} only through {af}. 
These two observations suggest that the critical value for the test can be obtained by simulating 
the conditional (1 — a) quantile of T* = T{{Xi,Yi*} , {di} , Sn) given {a} where Y* = a^ei for 
i = 1, n. This is called the plug-in critical value cf£^. See section |X] of the Appendix for detailed 
step-by-step instructions. 
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One-Step Approach. The test with the plug-in critical value is computationally rather simple. 
It has, however, poor power properties. Indeed, the distribution of T in general depends on / but 
the plug-in approach is based on the least favorable regression function / = 0, and so it is too con- 
servative when / is strictly increasing. More formally, suppose for example that a kernel weighting 
function is used, and that / is strictly increasing in /i-neighborhood of Xi but is constant in h- 
neighborhood of Xj. Let si = s{Xi,h) and S2 = s{Xj,h). Then b{si)/{V{si)y/'^ is no greater 
than b{s2)/{V{s2)y^'^ with probability approaching one. On the other hand, 6(si)/(y(si))^/^ 
is greater than b{s2)/{V{s2)y^'^ with nontrivial probability in the model with f{x) = for all 
x S M, which is used to obtain cfl^. Therefore, cf£^ overestimates the corresponding quantile 
of T. The natural idea to overcome the conservativeness of the plug-in approach is to simulate a 
critical value using not all elements of 5„ but only those that are relevant for the given sample. 
Two selection procedures developed in this paper are used to decide what elements of Sn should 
be used in the simulation. The main difficulty here is to make sure that the selection procedures 
do not distort the size of the test. The simpler of these two procedures is the one-step approach. 

Let {7n} be a sequence of positive numbers converging to zero, and let cfl^^ be the (1 — 7„) 
plug-in critical value. In addition, denote 

S^' = 5«^({X„y,},{a,},5„) = {sGSn: bis) /(Vis))'/' > -2cfi,J. 

Then the one-step critical value cf^^ is the conditional (1 — a) quantile of the simulated statistic 
T* = T{{Xi,Yi*},{ai},S^'^) given {aj and 5^^ where Y* = di€i for i = L^i Intuitively, the 
one-step critical value works because the weighting functions corresponding to elements of the set 
Sn\S^^ have an asymptotically negligible influence on the distribution of T under Hq. Indeed, 
the probability that at least one element s of Sn such that 

(1/2) J2 ifm-f{X,))sign{X,-XMX,,X„s)/{V{s))'/'>-c^l^^ (10) 

belongs to the set Sn\S^^ is at most 7n+o(l). On the other hand, the probability that at least one 
element s of Sn such that inequality ([TO]) does not hold for this element gives b{s)/{V{s)y^'^ > 
is again at most jn + o(l). Since 7„ converges to zero, this suggests that the critical value can 
be simulated using only elements of S^^ . In practice, one can set 7„ as a small fraction of a. 
For example, the Monte Carlo simulations presented in this paper use jn = 0.01 with a = 0.1. 

Stepdown Approach. The one-step approach, as the name suggests, uses only one step to cut 
out those elements of 5„ that have negligible influence on the distribution of T. It turns out that 
this step can be iterated using the stepdown procedure and yielding second-order improvements 
in the power. The stepdown procedures were developed in the literature on multiple hypothesis 
testing; see, in particular. Holm (1979), Romano and Wolf (2005a), Romano and Wolf (2005b), 

usual, I define the maximum over the empty set as +cx3, and so cfj?^ — +oo if S^''' is empty. 
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and Romano and Shaikh (2010), and Lehmann and Romano (2005) for a textbook introduction. 
The use of stepdown method in this paper, however, is rather different. 

To explain the stepdown approach, let me define the sequences {c{_^^)'^^ and (5^)^]^- Set 
c\_^^ = Ci^^^ and 5^ = S^^ . Then for / > 1, let d'l-^y^ be the conditional (1 — 7^) quantile of 
= T{{Xi,Y*},{ai},S!^) given {ctj} and 5^ where Y* = diCi for i = l,n and 

5^ = 5i({X,,y,},{^J,5„) = {s G 5„ : b{s)/{V{s)f/^ > -cfi^„ - c[-X}. 

It is easy to see that {c\_^^)'^^ is a decreasing sequence, and so 5^ 5 5^+^ for all / ^ 1. 
Since 5^ is a finite set, 5^'-°^ = Sn'^^'^^ for some l{0) ^ 1 and 5^ = 5^+^ for all I ^ /(O). 
Let = 5^*^*^''. Then the stepdown critical value cf^^ is the conditional (1 — a) quantile of 
= r({X„ ¥*}, {di},S^^) given {aj and where = a^ei for i = T;^. 

Note that 5,^^ C 5^'^ C 5^, and so c^^ ^ c^"^ ^ c^^ for any rj G (0, 1). This explains that 
the three methods for simulating the critical values are arranged in terms of increasing power. 

4. Theory under High-Level Conditions 

This section describes the high-level assumptions used in this paper and presents the main 
results under these assumptions. 

Let Ci, C2, (j), Ki, K2, and K3 be some strictly positive constants. The size properties of the 
test will be obtained under the following assumptions. 

Al. E[|ei|4+<^] ^ Ci and en ^ C2 for all i = T~n. 

This is a mild assumption on the moments of disturbances. The condition cjj ^ C2 for alH = 1, n 
precludes the existence of super-efficient estimators. 

Recall that the results in this paper are obtained for two types of estimators of Cj. When 
Gi = ei = Yi — f{Xi) for some estimator / of /, I will assume 

A2. (i)ai = Yi — f{Xi) for all i = l,n and (ii) f{Xi) — f{Xi) = Op{n~'^^) uniformly over i = l,n. 

This assumption is satisfied for many parametric and nonparametric estimators of /, see, in 
particular, subsection 13.41 When ai is some consistent estimator of cjj, I will assume 

A3, ai — ai = Op{n~'^'^) uniformly over i = l,n. 

See subsection 13 .41 for different available estimators. See also Section[5]and Lemma[2]in particular 
where Assumption AO is proven for the local version of Rice's estimator. 

A4. (i/(s)/F(s))i/2 - 1 = Op(n-'"3) and {V{s)/V{s))''/^ - 1 = Opin'''^) uniformly over s E 5^. 
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This is a high-level assumption that will be verified for particular choices of the weighting func- 
tions under more primitive conditions in the next section (Lemma [3]). 

Let 



An = max max 



sign{Xj - XMX„X„s)/{V{s))'/^ . 

I refer to An as a sensitivity parameter. It provides an upper bound on how much any test 
function depends on a particular observation. Intuitively, approximation of the distribution of 
the test statistic is possible only if An is sufficiently small. 

A5. nAnilogpY^'^ = 0(1) where p = the number of elements in the setSn- In addition, if 
holds, then for some -2 < (pi < cj), (i) {\ogpf /n^'^+'t>y^'^+'i'^') = o(l), (ii) A^n2/(^+'^i)(logp)3 = 
0(1), (Hi) yl^(logp)^ = 0(1), and (iv) logp/n'^^^'^^ = o(l). Finally, if is satisfied, then 
logp/n"^^"-' = 0(1). 

This is a key growth assumption that restricts the choice of the weighting functions and, hence, 
the set Sn- Note that this condition includes p only through logp, and so it allows an exponen- 
tially large (in the sample size n) number of weighting functions. Lemma [3] in the next section 
provides an upper bound on An for some choices of weighting functions, allowing me to verify 
this Assumption. 

Let be a class of models given by equation ([1]), regression function /, design points {^j}, 
distribution of {£«}, weighting functions Q{-,-,s) for s £ Sn, and estimators {di} such that 
uniformly over this class, (i) Assumptions A[T1 and A[5]are satisfied, and (ii) either Assumption 
A12] or m holdsH For M £ Ai, let PAf(') denote the probability under the distributions in the 
model M. Then 

Theorem 1. Let P = PI, OS, or SD. Let denote the set of all models M £ M satisfying 
Tio- Then 

inf Pm{T ^ cf_^) ^ 1 — a + o(l) as n —)• 00. 
MeMo 

In addition, let Aioo denote the set of all models M G M.q such that f = C for some constant 
C. Then 

sup P(r ^ cf_„) = 1 — a + 0(1) OS n —)• 00. 

M€Moo 

Comment 1. (i) This Theorem states that the Wild Bootstrap combined with the selection 
procedures developed in this paper yields valid critical values. Moreover, critical values are valid 



^Assumptions A[2l AO and A|4] contain statements of the form Z = Op{n~"') for some random variable Z and 
K > 0. I say that these assumptions hold uniformly over a class of models if for any C > 0, P{\Z\ > Cn'^^) — o(l) 
uniformly over this class. Note that this notion of uniformity is weaker than uniform convergence in probability. 
In addition, it applies to random variables defined on different probability spaces. 
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uniformly over the class of models A^o- The second part of the Theorem states that the test is 
nonconservative in the sense that its level converges to the nominal level a. 

(ii) The proof technique used in this theorem is based on finite sample approximations that 
are built on the results of Chatterjee (2005) and Chernozhukov, Chetverikov, and Kato (2011). 
In particular, the validity of the bootstrap is established without refering to the asymptotic 
distribution of the test statistic. 

(iii) Note that T has a form of U-statistic. The analysis of such statistics typically requires a 
preliminary Hoeffding projection. An advantage of the approximation method developed in this 
paper is that it applies directly to the test statistic with no need for the Hoeffding projection, 
which simplifies the analysis a lot. 

(iv) To obtain a particular application of the general result presented in this theorem, consider a 
basic set of weighting functions introduced in subsection l3.2[ Assume that (log nfl'^l (?i-/imin)^'^^ ~^ 
as n — )• 0. Then the number of weighting functions in the set 5„ is bounded from above by 
some polynomial in n, and so logp < logn. Lemma [3] in the next Section then implies that 
Assumptions m and A[5]hold with (/>i = (under mild conditions on i^(-) stated in Lemma [3]), 
and so the result of Theorem [1] applies for this 5„. Therefore, the basic set of weighting functions 
yields a test with the correct asymptotic size, and so it can be used for testing monotonicity. 
An advantage of this set is that, as will follow from Theorems U] and O it gives a test with the 
best attainable rate of uniform consistency in the minimax sense against alternatives with re- 
gression functions that have Lipschitz-continuous first order derivatives provided that /imin — ^ 
sufficiently fast. 

Let si = inii^i^oo Xi and Sr = sup^^j^o^Xj. To prove consistency of the test and to derive 
the rate of consistency against one-dimensional alternatives, I will also incorporate the following 
assumptions. 

A6. For any interval [x,x + A^;] C [s;, Sr] there exists an integer N and a constant C > such 
that for any N , \{i = l,n : Xi ^ [x,x + Aa;]}| ^ Cn. 

This Assumption often appears in the literature. Lemma [T] in the next section shows that it 
holds almost surely if {Xi} is an i.i.d. sequence from some distribution satisfying mild regularity 
conditions. 

A7. For any interval [x, A^;] C [s;, Sr] there exists an integer N and a constant C > such that 
for any N , there exists s £ Sn satisfying (i) the support ofQ{-, •, s) is contained in [x, x-l-A^.]^, 
(ii) Q{-,-,s) is bounded from above uniformly over n = l,oo, (iii) there exist nonintersecting 
subintervals [xi,xi + '^x,i] md [xr-,Xr + A^;^^.] of [x,x + A^,.] such that Q{xi,X2, s) ^ C whenever 
Xi G [xi,Xl + A^^i] and X2 € [Xr , Xr + A^^r] ■ 
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Let A^i be a subset of Ai consisting of all models satisfying Assumptions A[6]and A[71 Then 

Theorem 2. Let P = PI, OS, or SD. Then for any model M from the class Mi such that f 
is continuously differentiahle and there exist xi,X2 G such that xi < X2 and f{xi) > f{x2) 

(Ho is false), 

PhiiT ^ cf„Q,) —^0 as n oo. 

Comment 2. (i) This Theorem shows that the test is consistent against any fixed continuously 
differentiahle alternative. 

(ii) To compare the critical values based on the selection procedures developed in this paper 
with the plug-in approach (no selection procedure), assume that / is continuously differentiahle 
and strictly increasing (T-Lq holds). Then an argument like that used in the proof of Theorem 
2 shows that S^^ and S^^ will be empty w.p.a.l, which means that P{c^_^„ = 0} — )• 1 and 
Pjcfi^Q, = 0} — )• 1. On the other hand, P(cfZ„ > C) — )• 1 for some C > since each test 
statistic contains at least one weighting function. Moreover, under Assumption A171 it follows 
from the Sudakov-Chevet Theorem (see, for example, Theorem 2.3.5 in Dudley (1999)) that 
P(cfZa > C) — )• 1 for all C > 0. Finally, under Assumption AO which is stated below, it follows 
from the proof of lemma 2.3.15 in Dudley (1999) that P{cfi„ > C\/\ogn} — )• 1 for some C > 0. 
This explains the power improvements of one-step and stepdown approaches in comparison with 
the plug-in critical value. 

Theorem 3. Let P = PI, OS, or SD. Consider any model M from the class A4i such that f 
is continuously differentiahle and there exist xi, X2 G [s/, Sr] such that xi < X2 and f{xi) > f{x2) 
(Hq is false). Assume that for every sample size n, the true model Mn coincides with M except 
that the regression function has the form fn{') = lnf{') for some sequence {/„} of positive numbers 
converging to zero. Then 

^MniT ^ cf_Q,) — > as n — ^ CO 

as long as logp = o{l'^n). 

Comment 3. (i) This Theorem establishes the consistency of the test against one-dimensional 
local alternatives, which are often used in the literature to investigate the power of the test; 
see, for example, Andrews and Shi (2010), Lee, Song, and Whang (2011), and the discussion in 
Horowitz and Spokoiny (2001). 

(ii) Suppose that 5„ consists of a basic set of weighting functions and /imin — ^ polynomially 
fast. Then logp < Clogn, and so the test is consistent against one-dimensional local alternatives 
if (logn/n)i/2 = o(/„). 

(iii) Now suppose that 5„ is a maximal subset of a basic set such that for any xi,X2,h satisfying 
(xi,/i) G Sn and {x2,h) £ 5„, \x2 — xi\ > 2h. In addition, assume that /imin — ^ arbitrarily 
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slowly. Then the test is consistent against one-dimensional local alternatives if n^^/^ = o{ln)- 
In words, this test is -^/n-consistent against such alternatives. I note however, that the practical 
value of this -^/n-consistency is limited because there is no guarantee that for any given sample 
size n and given deviation from T-Lq, weighting functions suitable for detecting this deviation are 
already included in the test statistic. In contrast, it will follow from Theorem U] that the test 
based on a basic set of weighting functions does provide this guarantee. 

Let {Cj : j = 3, ...,8} be a set of strictly positive constants such that C3 < C4, C5 < Cq, 
and C7 < Cg. Let L > 0, /3 G (0,1], k ^ 0, and /i„ = (log p/n) 1/(2/^+3). To derive the uniform 
consistency rate against the classes of alternatives with Lipschitz derivatives, conditions AO and 
AH will be replaced by the following assumptions. 

A8. There exists an integer N such that for any n ^ N and any interval [xi,X2] C [s/,Sr] 
satisfying \x2 — xi\ ^ C^n'^^^ , C^n\x2 — xil ^ |{i = 1, n : Xj G [xi, a;2]}| ^ CQn\x2 — xi\. 

This Assumption is stronger than AO but is still often imposed in the literature; see Lemma [T] 
for sufficient primitive conditions. 

A 9. There exists an integer N such that for any n ^ N and any x G [si,Sr — C^hy^, there 
exists s ^ Sn satisfying (i) the support ofQ{-,-,s) is contained in [x, x + C^hn]'^ , (H) Q{-,-,s) is 
hounded from above by Cgh^, (Hi) there exist xi,Xr G [x,x + C^^hn] such that \xr — xi\ > 2C^hn 
and Q{xi,X2, s) ^ Cr/i^ whenever xi G [xi,xi + C^hn] and X2 G [xr,Xr + C^hn]- 

This Assumption is satisfied for the basic set of weighting functions if /imin satisfies /imin = 
o(logp/n)^/(^''+^). Let f^^\-) denote the first derivative of /(•). 

AlO. For any xi,X2 G [si,Sr], If^-^^xi) - /*^^^(a;2)| < L\xi - X2f ■ 

This is a smoothness condition that requires that the regression function is sufficiently well- 
behaved. 

Let A^2 be the subset of Ai consisting of all models satisfying Assumptions AlH AO and A fTOl 
Then 

Theorem 4. Let P = PI, OS, or SD. Consider any sequence of positive numbers {In} such that 
In — 7" 00, and let Ai2n denote the subset of M2 consisting of all models such that the regression 
function f satisfies inf^.g[j,j ,,^] f^^\x) < — /^(logp/n)'^/^^'^"'"^) . Then 

sup PmI^ ^ cf_„„) —7-0 as n ^ 00. 

Comment 4. (i) Theorem [J] gives the rate of uniform consistency of the test against Holder 
smoothness classes with parameters (/3 + 1,L). Importance of uniform consistency against suffi- 
ciently large classes of alternatives such as Holder smoothness classes was previously emphasized 
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in Horowitz and Spokoiny (2001). Intuitively, it guarantees that there are no reasonable alter- 
natives against which the test has low power if the sample size is sufficiently large. 

(ii) Suppose that 5„ consists of a basic set of weighting functions, K(-) is continuous and strictly 
positive on (— 1,+1), and /imin satisfies /imin = o(log n/n)^/(^^+'^) and (log n)'^/^/(n/i^i^)-'^/^ = 
0(1). Then Assumption ^[5] holds. In addition, it follows from Lemma [3] that Assumptions 
AH] and AO are satisfied (under mild conditions on K{-) stated in Lemma [3l), and so Theorem [J] 
implies that the test with this Sn is consistent whenever inf^-gj^^ fn\x) < — Z„(log n/n)''/^^''^^^ 
for some — >• co. On the other hand, it will be shown in Theorem [5] that no test can be consistent 
if inf^,gj3^ fn\x) > — C(logn/n)^/(^^~''^) for some sufficiently large C > 0. Therefore, the test 
is rate optimal in the minimax sense. 

To conclude this Section, I present a Theorem that gives a lower bound on the possible rate 
of uniform consistency against the class A^2 so that no test that maintains asymptotic size can 
have a higher rate of uniform consistency. Let ip = ^{Yi, 1^) be a generic test. In other words, 
■0(11, .■■,Yn) is the probability that the test rejects upon observing the data Yi, i = l,n. Note 
that for any deterministic test -0 = or 1. 

Theorem 5. For any test ip satisfying Ea/['0] ^ a + o(l) as ?i — t- 00 for all models M £ Ai 
such that T-Lq holds, there exists a sequence of models M = Mn belonging to the class M2 such 
that f = fn satisfies inf^.g[^j ,,^,] fn\x) < — C(log for some sufficiently large constant 

C > and Em„[V'] ^ ck + o(l) as n ^ 00. Here Ejvf„[-] denotes the expectation under the 
distributions of the model Mn- 

Comment 5. Combining the result of this Theorem with Comment 4-ii shows that the test based 
on a basic set of weighting functions and satisfying conditions of Comment 4-ii is rate optimal. In 
other words, no test that maintains asymptotic size can have a higher uniform consistency rate 
against the models with the regression function possessing the Lipschitz-continuous ffist order 
derivative. 

5. Verification of High-Level Conditions 

This section provides conditions that are sufficient for the Assumptions used in Section SI 
First, I discuss Assumptions Al6] and AlH] concerning the configuration of design points {-'^^j}. 
Then I consider Assumption Al3l which concerns the uniform consistency of the estimator ai 
of (Tj over i = l,n. Finally, I give an upper bound on the sensitivity parameter An and prove 
Assumption AH] for the case when 5„ consists of kernel weighting functions. 

Recall that the analysis in SectionOis for nonstochastic {A'j}. Alternatively, it can be viewed 
as conditional on {Xi}. Suppose that {Xi} is an i.i.d. sample from some distribution. The 
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Lemma below provides sufficient conditions so that Assumptions A[6] and A[8] hold for almost all 
realizations {^j}. 

Lemma 1. Suppose that {Xi\i^i^^ is an i.i.d. sample from the distribution on M with 
the bounded support [si,Sr]- Then Assumption J^holds for almost all realizations {-^^^jji^i^oo- 
In addition, if is absolutely continuous with respect to Lebesgue measure, and its density is 
bounded from above and away from zero on the support, then Assumption 43 holds for almost 
all realizations {^jji^j^ooS 

Note that sufficient conditions provided by Lemma[T]for Assumption AEJallow for point masses, 
whereas conditions for Assumption Al8]do not. 

From now on, I will again assume that {Xi} is nonstochastic. The next Lemma shows uniform 
consistency of the local version of Rice's estimator ai with an explicit rate of convergence in 
probability. 

Lemma 2. Suppose that is the local version of Rice's estimator of ai given in equation 
Suppose also that (i) Assumption ^^holds, (ii) \ogn = o{n'^/'^^~^'^^b\) for some sequence of 
positive numbers converging to zero, (Hi) |J(i)| ^ Cnbn for some C > and all i = l,n, (iv) 
\f{Xi) - f{Xj)\ < \Xi - Xj\ uniformly over i,j = l,n, and (v) \af - a'j\ < \Xi - Xj\ uniformly 
over i,j = l,n. Then maxi^j^„ \ai — crj| = Opibn)- 

Note that since + (/>) G (0,1), Assumption (iii) follows from Al8l and Assumption (iv) 
follows from A [TO] as long as {Xi\ is contained in the bounded set. Lemma [2] implies that 
Assumption Al3]holds for the local version of Rice's estimator with any K2 satisfying K2 < 0/(12 + 
3</>). 

Next, I consider restrictions on the weighting functions to ensure that Assumption AH] holds 
and give an upper bound on the sensitivity parameter Ayi. 

Lemma 3. Suppose that Sn consists of kernel weighting functions. In addition, suppose that 
(i) Assumptions 421 and 43 hold, (ii) K has the support [— 1,+1], is continuous, and strictly 
positive on the interior of its support, (iii) x G [si,Sr] for all {x,h) £ Sn, (iv) ^T-^min ~^ °° 
where hmin = min^^^ h^^g^h, and (v) /imax ^ (sr - s;)/2 where /imax = max(^. /i. Then 
(a) An ^ C/(n/ijnin)^/^ where C depends only on the kernel K and constants Ci,...,Cs; (b) if 
Assumption 43 is satisfied, then Assumption holds with K3 = K2; (c) if Assumption 43 is 
satisfied, then Assumption holds with any < {2 + (f))/{4 + 0i) for any (f)i G (—2, </>) as long 
as \ogp = oih^inn^-'^''^) and logp = o{hrainn^'^^'^'^^^'^^'*"^~'"'')- 

^Recall that in section|4l si and s,. were defined by s; — infi^i^oo Xi and Sr = supj^^j^,^ Xi. It is easy to show 
that the definition given in this Lemma coincides with that definition for almost all realizations {Xiji^i^oo. 
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Restrictions on the kernel K imposed in this Lemma are satisfied for most commonly used 
kernel functions including uniform, triangular, Epanechnikov, biweight, triweight, and tricube 
kernels. Note, however, that these restrictions exclude higher order kernels since those are nec- 
essarily negative at some points on their supports. 



Most empirical studies contain additional covariates that should be controlled for. In this 
section, I extend the results presented in Sections H] and [5] to allow for this possibility. I consider 
cases of both partially linear and nonparametric models. For brevity, I will only consider the 
results concerning size properties of the test, and I will assume that ai is estimated by di = £i 
for all i = l,n. The power properties of the test can be obtained using the arguments closely 
related to those used in Theorems [21 [3l and [H 

6.1. Partially Linear Model. In this model, additional covariates enter the regression function 
as additively separable linear form. In other words, the model is given by 



where {Yi, Xi, £i} are defined as in the Introduction, {Zi} C M is a sequence of nonstochastic 
additional covariates, and /? G M*^ is a vector of coefficients. As above, the problem is to test 
the null hypothesis, TIq, that f{x) is nondecreasing against the alternative, Tia, that there are 
xi and X2 such that xi < X2 but f{xi) > /(X2). 

An advantage of the partially linear model outlined above over the fully nonparametric model 
is that it does not suffer from the curse of dimensionality, which decreases the power of the test 
and may be a severe problem if the researcher has many additional covariates to control for. On 
the other hand, the partially linear model does not allow for heterogeneous effects of the factor 
X, which might be restrictive in some applications. It should be taken into account that the test 
obtained for the partially linear model will be inconsistent if this model is misspecified. 

Let me now describe the test. The idea behind the test is to estimate /3 by ^ and to apply 
the methods described in section [3] for the dataset {Xi,Yi — Zff3}. More precisely, let /3 be a 
-y/n-consistent estimator of f3. For example, one can take an estimator of Robinson (1988), which 
is 



where % = Zi - E[Z\X = Xi], % = Y, - E[Y\X = Xi], and E[Z\X = Xi] and E[Y\X = Xi] 
are nonparametric estimators of E[Z\X = Xi] and i?[y|X = Xi\ respectively; see discussion in 
Horowitz (2009) for a set of regularity conditions underlying y^-consistency of this estimator. 
Define Yi = Yi — Z^f3, and let the test statistic be T = T{{Xi,Yi} , {ai} , Sn) where 5, = = 



6. Models with Multivariate Covariates 



Y, = f{X,) + Zf^ + ei,i 



= 1,2,3,... 
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Yi — f{Xi) — Zjp and f{Xi) is some estimator of f{Xi), which is uniformly consistent over 
i = l,n. The critical value for the test is simulated by one of the methods (plug-in, one-step, 
or stepdown) described in Section [3] using the data {Xi,Yi}, estimators and the set of 

weighting functions 5„,. As in Section O let cf_f^, cf_^^, and cf^^ denote the plug-in, one-step, 
and stepdown critical values correspondingly. 

Let Cg > be some constant. To obtain results for partially linear models, I will impose the 
following condition. 

All. (i) \\Zi\\ ^ Cg for all i = l,n, (ii) limc~^ooPi\\l3 — /3|| > Cn^^/^) — )• uniformly over all 
n, and (in) max^g5„ Ei^ijXn <3(^i> = o{^Jn/\ogp). 

Let MpL denote any set of models in Ai such that Assumptions A[2] and A ITTl are satisfied 
uniformly over Mpi- It follows from the proof of Lemma [3] that Assumption Alll-iii is satisfied 
if Sn consists of kernel weighting functions as long as /imax satisfies /imax = o(l/logp). The size 
properties of the test are given in the following theorem. 

Theorem 6. Let P = PI, OS, or SD. Let MpLfl denote the set of all models M G M.pL,o 
satisfying T-Lq. Then 

inf Pm ^ cf_„) ^ 1 — Q + o(l) as n ^ oo. 

In addition, let Mpifio denote the set of all models M G A^pl,o such that f = C for some 
constant C . Then 

sup ^m{T ^ cf_„) = 1 — a + o(l) as n ^ oo. 

M£MpL,00 

6.2. Nonparametric Model. In this subsection, I do not assume that the regression function 
is separably additive in additional covariates. Instead, I assume that the regression function has 
a general nonparametric form, and so the model is given by 

Y, = fiX„Z,) + £i,i = 1,2,3,... 

where {Xi,Zi} is a sequence of 1 + d vectors of nonstochastic covariates, {Yi} is a sequence 
of scalar dependent random variables, and {ej} is a sequence of unobservable scalar random 
variables satisfying E[ej] = for all i = l,n. 

Let Sz be some subset of The null hypothesis, T-Lq, to be tested is that for any xi,X2 G M 
and z G 5^, f{xi,z) ^ fix2,z) whenever xi ^ X2. The alternative. Ha, is that there are 
xi,X2 G M and z £ Sz such that Xi ^ X2 but f{xi,z) > f{x2,z). 

The choice of the set Sz is up to the researcher and has to be made depending on theoretical 
considerations. For example, if Sz = M'^, then TIq means that the function / is increasing in the 
first argument for any given value of the second argument. If the researcher is interested in one 
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particular value, say, zq, then she can set Sz = zq, which will mean that under T-Lq, the function 
/ is increasing in the first argument when the second argument equals zq. 

The advantage of the nonparametric model studied in this subsection is that it is fully flexible 
and, in particular, allows for heterogeneous effects of X on Y. On the other hand, the nonpara- 
metric model suffers from the curse of dimensionality and may result in tests with low power if 
the researcher has many additional covariates. In this case, it might be better to consider the 
partially linear model studied above. 

To define the test statistic, let 5„ and Q{-, •, s) be the same as in Section [3l Then define 
Sn = {{s,z) : s G 5„, z = Zi for some i = 1, n such that Zi G Sz}, 
and for s = (s, z) G 5„, let 

bis) = (1/2) E (^i - ^i)sign(^, - XMXi, Z„X„ Zj,s) 

be a test function where 

QiXi,Z^,X,,Zj,s) = Q{X^,Xj,s)K{{Zi - z) /h{s))K{{Zj - z)/h{-s)), 

is some positive compactly supported auxiliary kernel function, and hi^s), s G S>^, 
are auxiliary bandwidth values. Intuitively, Q is a local-in-z version of the weighting function Q. 
The variance of b{s) is given by 



2 



yis)= E ^ signiX,-X,)Q{X„Zi,Xj,Zj,s) 



and the estimated variance is 



yis)= E ^ sign{X,-XMX„Zi,Xj,Zj,s) 

Then the test statistic is 

1 = max — -j= 
^V{s) 

Large values of T indicate that T-Lq is violated. The critical value for the test can be calculated 
using any of the methods described in Section [3] with the only difference being that now Q, s 
and Sn should be used instead of Q, s and 5„, and the selection procedures choose subsets of 
Sn instead of 5„. Let Cii^, Ci^^, and cf^^ denote the plug-in, one-step, and stepdown critical 
values correspondingly. In addition, let 



An = max max 



sign(X,- - Xi)QiX„X,, s)/{V{s))' 
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be a sensitivity parameter. Finally, let p = \Sn\, the number of elements in the set Clearly, 
p ^ pn where p = |5„ | . 

Let Cio be some positive constant. To prove results concerning multivariate nonparametric 
model, I will impose the following condition. 

A12. (i)P{\ei\ ^ n) ^ exp(-n^/Cio) forallu ^ andui ^ C2 for alii = l,n, (ii) An{logpf^^ = 
0(1), (Hi) logp/n'^i^''^ = 0(1), (iv) for some -2 < 0i < 00, (logp)^ /n'-^+'I'^/^'^+'f'^^ = o{l), 
andi2„2/(4+0O(iog^)3 = 0(1), (v) h{s)Y.i^^^,^nQ{Xr,Z,,X,,Z,,-s)/{V{s)Yl^ = o{1/^WP) 
uniformly over s £ Sn, and (vi) the regression function f has uniformly bounded first order 
partial derivatives. 

Condition (i) of this Assumption imposes that Ei have sub-Gaussian tails, which is stronger than 
Assumption A[TJ Conditions (ii)-(v) are of high level. To give more primitive conditions, as- 
sume that Sn consists of kernel weighting functions so that s = {s,z) = {{x,h),z), and that 
the number of points contained in each cube with the center {Xj,Zj), j = l,n, 

and edges h x h{s)'^ is bounded from below and from above by Chh{sY and Chh{sY corre- 
spondingly with some constants < C < C < 00. Then logp < logn. Let Sn,h = {{h^h) ■ 
h = h{{x,h),z) for some x and z such that ({x,h),z) G Then it follows from the proof of 

Lemma[3]that An ^ max^f^ h)&s„ h ^/ {'^^^'^Y^'^ ■ Therefore, by setting (/>i sufficiently large, condi- 
tions (ii)-(v) hold if nhh!^ — )• cxd and nhh'^^'^ — t- polynomially fast uniformly over [h, h) G Sn^h- 

The key difference between the multivariate case studied in this section and univariate case 
studied in Section H] is that now it is not necessarily the case that E[6(s)] ^ under Hq. The 
reason is that the values f{xi,zi) and f{x2,Z2) are noncomparable unless zi = Z2- This yields 
nonvanishing bias term in the test statistic. Condition (v) of Assumptionll2l ensures that this bias 
is asymptotically negligible relative to the concentration rate of the test statistic. The difficulty, 
however, is that this condition is inconsistent with n^^(logp)^/^ — t- imposed in Assumption 
Al5] (where I replaced An and p by their multivariate analogs An and p). Indeed, condition 
nAni^ogpy^'^ — >• essentially requires nh^h'^'^ — t- 00, and so it contradicts to nhh'^^'^ — )• 0, which 
follows from condition (v) of A I12I To deal with this problem, I impose more stringent moment 
condition All2l-i than that used in Section [H A[TJ This allows me to apply more powerful methods 
developed in Chernozhukov, Chetverikov, and Kato (2012) and replace nA^(logp)^/^ — >• by 
Anilogpy^"^ = 0(1); see Assumption A[T2]-ii. 

Let M.NP denote any set of models such that Assumptions A[2]with f{Xi) and f{Xi) replaced 
by f{Xi, Zi) and f{Xi, Zi), AH] with s and Sn replaced by s and 5„, and A[T2]hold uniformly over 
M.NP- Then 
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Theorem 7. Let P = PI, OS, or SD. Let M.np,o denote the set of all models M G M.np 
satisfying T-Lq. Then 

inf PniT ^ cf_ ) ^ 1 — a + o(l) as n — )• oo. 
MeMNP,o 

In addition, let Mnp^q denote the set of all models M G Mnp,o such that f = C for some 
constant C. Then 

sup Pm(T' ^ cf_Q,) = 1 — a + o(l) as n ^ oo. 

MeMNP,oo 

7. Monte Carlo Simulations 

In this section, I provide results of a small simulation study. The aim of the simulation study 
is to shed some light on the size properties of the test in finite samples and to eonipare its power 
with that of other tests developed in the literature. In particular, I consider the tests of Gijbels, 
Hall, Jones, and Koch (2000) (GHJK), Ghosal, Sen, and van der Vaart (2000) (GSV), and Hall 
and Heckman (2000) (HH). 

I consider samples of size n = 100, 200, and 500 with equidistant nonstochastic Xj's on the 
[— 1, 1] interval, and regression functions of the form / = cix — C2<j){c3x) where Ci, C2, C3 ^ and 
(p{-) is the pdf of the standard normal distribution. I assume that {si} is a sequence of i.i.d. 
zero-mean random variables with standard deviation a. Depending on the experiment, Si has 
either normal or continous uniform distribution. Four combinations of parameters are studied: 
(1) ci = C2 = C3 = and cr = 0.05; (2) ci = 03 = 1, 02 = 4, and a = 0.05; (3) ci = 1, C2 = 1.2, 
C3 = 5, and a = 0.05; (4) ci = 1, C2 = 1.5, C3 = 4, and a = 0.1. Cases 1 and 2 satisfy Ho 
whereas cases 3 and 4 do not. In case 1, the regression function is fiat corresponding to the 
maximum of the type I error. In case 2, the regression function is strictly increasing. Cases 3 
and 4 give examples of the regression functions that are mostly increasing but violate Ho in the 
small neighborhood near 0. All functions are plotted in figure 2. The parameters were chosen so 
that to have nontrivial rejection probability in most cases (that is, bounded from zero and from 
one). 

Let me describe the tuning parameters for all tests that are used in the simulations. For the 
tests of GSV, GHJK, and HH, I tried to follow their instructions as closely as possible. For the 
test developed in this paper, I use kernel weighting functions with A; = 0, 5„ = {{x,h) : x E 
{Xi, ...,Xn},h G Hn}, and the kernel K{x) = 0.75(1 — x^) for x G (— 1;+1) and otherwise. 
I use the set of bandwidth values Hn = {hmaxu'' : h ^ /imim ^ = 0, 1,2, ...}, u = 0.5, /imax = 1, 
hmhi = , and the truncation parameter 7 = 0.01. For the test of GSV, I use the 

same kernel K with the bandwidth value /i„ = n^^/^, which was suggested in their paper, and 
I consider their their sup-statistic. For the test of GHJK, I use their run statistic maximized 
over k G {10(j — 1) + 1 : j = l,2,...0.2n} (see the original paper for the explanation of the 
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Figure 2. Regression Functions Used in Simulations 
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notation). For the test of HH, local polynomial estimates are calculated over r G at every 
design point X^. The set nKn is chosen so that to make the results comparable with those for 
the test developed in this paper. Finally, I consider two versions of the test developed in this 
paper depending on how cji is estimated. More precisely, I consider the test with Oi estimated 
by the Rice's method (see equation ([6])), which I refer to in the table below as CS (consistent 
sigma), and the test with Oi = ej where is obtained as the residual from estimating / using 
the series method with polynomials of order 5, 6 and 8 whenever the sample size n, is 100, 200, 
and 500 respectively, which I refer to in the table below as IS (inconsistent sigma). 

The rejection probabilities corresponding to nominal size a = 0.1 for all tests are presented 
in table 1. The results are based on 1000 simulations with 500 bootstrap repetitions in all cases 
excluding the test of GSV where the asymptotic critical value is used. 

The results of the simulations can be summarized as follows. First, the results for normal 
and uniform disturbances are rather similar. The test developed in this paper with ai estimated 
using the Rice's method maintains the required size quite well (given the nonparametric structure 
of the problem) and yields size comparable with that of the GSV, GHJK, and HH tests. On 
the other hand, the test with ai = Si does pretty well in terms of size only when the sample 
size is as large as 500. When the null hypothesis does not hold, the CS test with the stepdown 
critical value yields the highest proportion of rejections in all cases. Moreover, in case 3 with 
the sample size n = 200, this test has much higher power than that of GSV, GHJK, and HH. 
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Proportion of Rejections for 



1^ L^lDC 






GSV 


GHJK 


HH 


CS-PI 


CS-OS 


CS-SD 


IS-PI 


IS-OS 


IS-SD 






100 


.118 


.078 


.123 


.128 


.128 


.128 


.164 


.164 


.164 


normal 


1 


200 


.091 


.051 


.108 


.114 


.114 


.114 


.149 


.149 


.149 






500 


.086 


.078 


.105 


.114 


.114 


.114 


.133 


.133 


.133 






100 





.001 





.001 


.008 


.008 


.008 


.024 


.024 


normal 


2 


200 





.002 





.001 


.010 


.010 


.007 


.017 


.017 






500 





.001 





.002 


.007 


.007 


.005 


.016 


.016 






100 





.148 


.033 


.259 


.436 


.433 











normal 


3 


200 


.010 


.284 


.169 


.665 


.855 


.861 


.308 


.633 


.650 






500 


.841 


.654 


.947 


.982 


.995 


.997 


.975 


.995 


.995 






100 


.037 


.084 


.135 


.163 


.220 


.223 


.023 


.042 


.043 


normal 


4 


200 


.254 


.133 


.347 


.373 


.499 


.506 


.362 


.499 


.500 






500 


.810 


.290 


.789 


.776 


.825 


.826 


.771 


.822 


.822 






100 


.109 


.079 


.121 


.122 


.122 


.122 


.201 


.201 


.201 


uniform 


1 


200 


.097 


.063 


.109 


.121 


.121 


.121 


.160 


.160 


.160 






500 


.077 


.084 


.107 


.092 


.092 


.092 


.117 


.117 


.117 






100 


.001 


.001 








.006 


.007 


.017 


.032 


.033 


uniform 


2 


200 











.001 


.010 


.010 


.012 


.022 


.024 






500 





.003 





.003 


.011 


.011 


.011 


.021 


.021 






100 





.151 


.038 


.244 


.438 


.449 











uniform 


3 


200 


.009 


.233 


.140 


.637 


.822 


.839 


.290 


.607 


.617 






500 


.811 


.582 


.947 


.978 


.994 


.994 


.975 


.990 


.990 






100 


.034 


.084 


.137 


.155 


.215 


.217 


.024 


.045 


.046 


uniform 


4 


200 


.197 


.116 


.326 


.357 


.473 


.478 


.323 


.452 


.456 






500 


.<S()3 


.2().") 


.789 


.785 


.844 


.840 


.782 


.847 


.848 



Nominal Size is 0.1. GSV, GHJK, and HH stand for the tests of Ghosal, Sen, and van der Vaart (2000), 
Gijbels, Hall, Jones, and Koch (2000), and Hall and Heckman (2000) respectively. CS-PI, CS-OS, and 
CS-SD refer to the test developed in this paper with cTj estimated using Rice's formula and plug-in, 
one-step, and stepdown critical values respectively. Finally, IS-Pl, IS-OS, and IS-SD refer to the test 
developed in this paper with cTj estimated by di = Si and plug-in, one-step, and stepdown critical values 

respectively. 



The CS test also has higher power than that of the IS test. Finally, the table shows that the 
one-step critical value gives a notable improvement in terms of power in comparison with plug-in 
critical value. For example, in case 3 with the sample size n = 200, the one-step critical value 
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gives additional 190 rejections out 1000 simulations in comparison with the plug-in critical value 
for the CS test and additional 325 rejections for the IS test. On the other hand, the stepdown 
approach gives only minor improvements over the one-step approach. Overall, the results of 
the simulations are consistent with the theoretical findings in this paper. In particular, selection 
procedures yielding one-step and stepdown critical values improve power with no size distortions. 
Additional simulation results are presented in the supplementary Appendix. 

8. Empirical Application 

In this section, I review the arguments of Ellison and Ellison (2011) on how strategic en- 
try deterrence might yield a nonmonotone relation between market size and investment in the 
pharmaceutical industry and then apply the testing procedures developed in this paper to their 
dataset. I start with describing their theory. Then I provide the details of the dataset. Finally, 
I present the results. 

In the pharmaceutical industry, incumbents whose patents are about to expire can use in- 
vestments strategically to prevent generic entries after the expiration of the patent. In order 
to understand how this strategic entry deterrence influences the relation between market size 
and investment levels, Ellison and Ellison (2011) developed two models for an incumbent's in- 
vestment. In the first model, potential entrants do not observe the incumbent's investment but 
they do in the second one. So, a strategic entry deterrence motive is absent in the former model 
but is present in the latter one. Therefore, the difference in incumbent's investment between 
two models is explained by the strategic entry deterrence. Ellison and Ellison showed that in 
the former model, the investment-market size relation is determined by a combination of direct 
and competition effects. The direct effect is positive if increasing the market size (holding en- 
try probabilities fixed) raises the marginal benefit from the investment more than it raises the 
marginal cost of the investment. The competition effect is positive if the marginal benefit of the 
investment is larger when the incumbent is engaged in duopoly competition than it is when the 
incument is a monopolist. The equilibrium investment is increasing in market size if and only 
if the sum of two effects is positive. Therefore, a sufficient condition for the monotonicity of 
investment-market size relation is that both effects are of the same signj§ In the latter model, 
there is also a strategic entry deterrence effect. The authors noted that this effect should be 
relatively less important in small and large markets than it is in markets of intermediate size. 
In small markets, there are not enough profits for potential entrants, and there is no need to 
prevent entry. In large markets, profits are so large that no reasonable investment levels will be 
enough to prevent entries. As a result, strategic entry deterrence might yield a nonmonotonic 



interested reader can find a more detailed discussion in the original paper. 



TESTING REGRESSION MONOTONICITY 



29 



relation between market size and investment no matter whether the relation in the model with 
no strategic entry deterrence is increasing or decreasing. 

Ellison and Ellison studied three types of investment: detail advertising, journal advertising, 
and presentation proliferation. Detail advertising, measured as per-consumer expenditures, refers 
to sending representatives to doctors' offices. Since both revenues and cost of detail advertising 
are likely to be linear in the market size, it can be shown that the direct effect for detail advertis- 
ing is zero. The competition effect is likely to be negative because detail advertising will benefit 
competitors as well. Therefore, it is expected that detail advertising is a decreasing function 
of the market size in the absence of strategic distortions. Stategic entry deterrence should de- 
crease detail advertising for markets of intermediate size. Journal advertising is the placement of 
advertisements in medical journals. Journal advertising is also measured as per-consumer expen- 
ditures. The competition effect for journal advertising is expected to be negative for the same 
reason as for detail advertising. The direct effect, however, may be positive because the cost 
per potential patient is probably a decreasing function of the market size. Opposite directions 
of these effects make journal advertising less attractive for detecting strategic entry deterrence 
in comparison with detail advertising. Nevertheless, following the original paper, I assume that 
journal advertising is a decreasing function of the market size in the absence of strategic distor- 
tions. Presentation proliferation is selling a drug in many different forms. Since the benefits of 
introducing a new form is approximately proportional to the market size while the costs can be 
regarded as fixed, the direct effect for presentation proliferation should be positive. In addition, 
the competition effect is also likely to be positive because it creates a monopolistic niche for the 
incumbent. Therefore, presentation proliferation should be positively related to market size in 
the absence of strategic distortions. 

The dataset consists of 63 chemical compounds, sold under 71 different brand names. All of 
these drugs lost their patent exclusivity between 1986 and 1992. There are four variables in the 
dataset: average revenue for each drug over three years before the patent expiration (this measure 
should be regarded as a proxy for market size), average costs of detail and journal advertising 
over the same time span as revenues, and a Herfindahl-style measure of the degree to which 
revenues are concentrated in a small number of presentations (this measure should be regarded 
as the inverse of presentation proliferation meaning that higher values of the measure indicate 
lower presentation proliferation). 

Clearly, the results will depend on how I define both dependent and independent variables 
for the test. Following the strategy adopted in the original paper, I use log of revenues as 
the independent variable in all cases, and the ratio of advertising costs to revenues for detail 
and journal advertising and the Herfindahl-style measure for presentation proliferation as the 
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dependent variable. The null hypothesis is that the corresponding conditional mean function is 
decreasing!^ 

I consider the test with kernel weighting functions with /c = or 1 and the kernel K{x) = 
0.75(1 — x^) for X G (—1, 1) and otherwise. I use the set of bandwidth values Hn = {0.5; 1} and 
the set of weighting functions 5„ = : x £ {Xi, Xn},h G Hn}. Implementing the test 

requires estimating af for all i = 1, n. Since the test based on Rice's method outperformed that 
with ai = Ei in the Monte Carlo simulations, I use this method in the benchmark procedure. I also 
check robustness of the results using the following two-step procedure. First, I obtain residuals 
of the OLS regression of y on a set of transformations of X. In particular, I use polynomials in 
X up to the third degree (cubic polynomial). Second, squared residuals are projected onto the 
same polynomial in X using the OLS regression again. The resulting projections are estimators 
df of af, i = 1, ...,n. 

The results of the test are presented in table 2. The table shows the p- value of the test for 
each type of investment and each method of estimating af. In the table, method 1 corresponds 
to estimating af using Rice's formula, and methods 2, 3, and 4 are based on polynomials of 
first, second, and third degrees respectively. Note that all methods yield similar numbers, which 
reassures the robustness of the results. All the methods with k = Q reject the null hypothesis 
that journal advertising is decreasing in market size with 10% confidence level. This may be 
regarded as evidence that pharmaceutical companies use strategic investment in the form of 
journal advertising to deter generic entries. On the other hand, recall that direct and competition 
effects probably have different signs for journal advertising, and so rejecting the null may also 
be due to the fact that the direct effect dominates for some values of market size. In addition, 
the test with k = 1 does not reject the null hypothesis that journal advertising is decreasing in 
market size at the 10% confidence level, no matter how cjj are estimated. No method rejects the 
null hypothesis in the case of detail advertising and presentation proliferation. This may be (1) 
because firms do not use these types of investment for strategic entry deterrence, (2) because the 
strategic effect is too weak to yield nonmonotonicity, or (3) because the sample size is not large 
enough. Overall, the results are consistent with those presented in Ellison and Ellison (2011). 



In the original paper, Ellison and Ellison (2011) test the null hypothesis consisting of the union of monoton- 
ically increasing and monotonically decreasing regression functions. The motivation for this modification is that 
increasing regression functions contradict the theory developed in the paper and, hence, should not be considered 
as evidence of the existence of strategic entry deterrence. On the other hand, increasing regression functions 
might arise if the strategic entry deterrence effect overweighs direct and competition effects even in small and large 
markets, which could be considered as extreme evidence of the existence of strategic entry deterrence. 
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Table 2. Incumbent Behavior versus Market Size: Monotonicity Test p- value 



Investment Type 

Method Detail Advertising Journal Advertising Presentation Proliferation 





k=0 


k=l 


k=0 


k=l 


k=0 


k=l 


1 


.120 


.111 


.056 


.120 


.557 


.661 


2 


.246 


.242 


.088 


.168 


.665 


.753 


3 


.239 


.191 


.099 


.195 


.610 


.689 


4 


.301 


.238 


.098 


.194 


.596 


.695 



9. Conclusion 

In this paper, I have developed a general framework for testing monotonicity of a nonpara- 
metric regression function, and have given a broad class of new tests. A general test statistic 
uses many different weighting functions so that an approximately optimal weighting function is 
determined automatically. In this sense, the test adapts to the properties of the model. I have 
also obtained new methods to simulate the critical values for these tests. These are based on 
selection procedures. The procedures are used to estimate what counterparts of the test statistic 
should be used in simulating the critical value. They are constructed so that no violation of the 
asymptotic size occurs. Finally, I have given tests suitable for models with multiple covariates 
for the first time in the literature. 

The new methods have numerous applications in economics. In particular, they can be applied 
to test qualitative predictions of comparative statics analysis including those derived via robust 
comparative statics. In addition, they are useful for evaluating monotonicity assumptions, which 
are often imposed in economic and econometric models, and for classifying economic objects in 
those cases where classification includes the concept of monotonicity (for example, normal/inferior 
and luxury /necessity goods). Finally, these methods can be used to detect strategic behavior of 
economic agents that might cause nonmonotonicity in otherwise monotone relations. 

The attractive properties of the new tests are demonstrated via Monte Carlo simulations. In 
particular, it is shown that the rejection probability of the new tests greatly exceeds that of 
other tests for some simulation designs. In addition, I applied the tests developed in this paper 
to study entry deterrence effects in the pharmaceutical industry using the datasct of Ellison and 
Ellison (2011). I showed that tlic investment in tlic form of journal advertising seems to be used 
by incumbents in order to prevent generic entries after the expiration of patents. The evidence 
is rather weak, though. 
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Appendix A. Implementation Details 



In this section, I provide detailed step-by-step instructions for implementing plug-in, one-step, 
and stcpdown critical values. The instructions arc given for constructing a test of level a. In all 
cases, let S be a large integer denoting the number of bootstrap repetitions, and let {(ifiYi^n,^i 
be a set of independent A^(0, 1) random variables. For one-step and stcpdown critical values, let 
7 denote the truncation probability, which should be small relative to a. 



A.l. Plug- in Approach. 

(1) For each b = l,B and i = l,n, calculate Y*^^ = aiei^j,. 

(2) For each b = 1,B, calculate the value of the test statistic using the sample {Xi, 

(3) Define the plug- in critical value, cfl^, as the (1 — a) sample quantile of {T^}h=i- 



A. 2. One-Step Approach. 

(1) For each b = 1,B and i = l,n, calculate Y*^^ = ajCj^fe. 

(2) Using the plug-in approach, simulate cfi^. 

(3) Define S^^ as the set of values s e Sn such that b{s)/{V{s))^/^ > -2cfi^. 

(4) For each 6=1,5, calculate the value of the test statistic using the sample {X^, 
and taking maximum only over S^^ instead of Sn- 

(5) Define the one-step critical value, c^^^, as the (1 — a) sample quantile of 



A. 3. Stepdown Approach. 

(1) For each b = 1,B and i = l,n, calculate Y*^^ = ajCj^b- 

(2) Using the plug-in and one-step approaches, simulate cf^^ and cf^^, respectively. 

(3) Denote 5° = S^^, = cf_^^, and set I = 0. 

(4) For given value of Z ^ 0, define 5^+^ as the set of values s e such that 6(s)/(F(s))^/^ > 



(5) For each b = l,B, calculate the value of the test statistic using the sample {Xi, i^^^jiLi 
and taking the maximum only over S^^^ instead of 5„. 

(6) Define c'"*"^, as the (1 — 7) sample quantile of {T^}h=i- 

(7) If 5^+^ = 5^, then go to step (8). Otherwise, set / = / -|- 1 and go to step (4). 

(8) For each 6 = l,B, calculate the value of the test statistic using the sample {Xi,Y-\}2^i 
and taking the maximum only over iS^ instead of Sn- 

(9) Define cff'^, as the (1 — a) sample quantile of {T^}f^i- 
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Appendix B. Additional Notation 

I will use the following additional notation in Appendices C and D. Recall that {ej} is a 
sequence of independent A^(0, 1) random variables that are independent of the data. Denote 
ei = Uiei and Ci = diei for i = 1, n. Let 

Wi{s)= ^ s\gn{Xj - Xi)Q{Xi,Xj,s), 

a^is) = Wi{s)/iV{s))'/' and = w,{s)/{V{s)f/\ 
e(s) = ^ ai{s)ei, and e{s) = ^ ai{s)ei, 

sis) = ^ ai{s)£i and e(s) = ^ ai{s)si, 

f{s) = a,{s)fiX,) and f{s) = a,(s)/(X,). 

Note that T = ma.Xses„ Z]i^j^n^«('5)^i = ™axse5„(/(s) + In addition, for any S C 5„, 

which may depend on the data, and all r] G (0, 1), let denote the conditional r] quantile of 

^ Q 

r* = T{{Xi,Y*}, {di},S) given {di} and S where = UiCi for i = l,n, and let c^' denote 
the conditional rj quantile of T* = T({Xi, YJ*}, {uj}, 5) given 5 where Y* = a-iei for i = l,n. 
Further, for r/ ^ 0, define and c^' as — oo, and for ?? ^ 1, define and c^' as +oo. 

Moreover, denote V = maXsg5^(y(s)/F(s))^/^. Let {V'n} be a sequence of positive numbers 
converging to zero sufficiently slowly so that (i) logp/n'^^ = o(V'n) (recall that by Assumption 
AO logp/n'^^ = o(l), and so such a sequence exists), (ii) uniformly over 5 C 5„ and rj G (0, 1), 
P(c!^+^„ < c^) = and P(c^^^^ < cj^''^) = o(l) (Lemma [9] establishes existence of such a 
sequence under Assumptions AHJ AlSj AlH and Al5] and Lemma [13] establishes existence under 
Assumptions AlH AEJ AHl and AE]). Let 

For D = PI,OS,SD,R, let = cf" and c^'° = 4"'° where = 5^. Note that c^^'° and 
c^'^ are nonstochastic. 

Finally, I denote the space of A;-times continuously differentiable functions on M by C'^(M, M). 
For g G C'^(M, M), the symbol g^"^^ for r ^ k denotes the rth derivative of g, and ||5'^^-'||oo = 
suptei;|5^''^(t)|- 

Appendix C. Proofs for section H] 

In this Appendix, I first prove a sequence of auxiliary lemmas (subsection IC.ip . Then I present 
the proofs of the theorems stated in section U] (subsection IC.2P . 
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C.l. Auxiliary Lemmas. 

Lemma 4. E[max^g5„ |e(s)|] < (logp)^/^ 

Proof. Note that by construction, e(s) is distributed as a A^(0, 1) random variable, and \Sn\ = P- 
So, the result follows from lemma 2.2.2 in Van der Vaart and Wellner (1996). □ 

Lemma 5. Uniformly over S C 5„ and A > 0, sup^gg P(maXse5 e(s) G {t,t + A)) < A(logp)^/^. 
In particular, for any {rj,6) G (0, 1)^ and S C Sn, c^4<5 ~ 

' ^ C(5/(logp)^/2 for some constant 

C>0. 

Proof. The first claim follows by combining Lemma H] in this paper and Theorem 1 in Cher- 
nozhukov, Chetverikov, and Kato (2011). The second claim follows from the result in the first 
claim. □ 

Lemma 6. There exists a constant C > such that for all S C Sn, i] G (0, 1), and t G M, 



c. 



»?-C|i| logp/(l-r;) ^ (1 +*) ^ S+C|t|logP/(l-'?)' 



Proof. Recall that c^'^ is the rj quantile of max^g^ e(s), and so combining Lemma H] and Markov 
inequality shows that c^'^ < (logp)-'^/^/(l — rj). Therefore, Lemma [5] gives 



c 



?;c|*|iogp/(i-,) - ^ C|t|(logp)VV(l - r?) ^ I^K'O 



The lower bound follows similarly. □ 
Lemma 7. Under Assumption uniformly over S C Sn, /3 > 0, and g G C^(IR,M), 

|E[5(max£(5)) - 5(maxe(s))]| < ll^^'^IU logp//3 + n^^db^^^^ ||oc + PU^'^ IU + /3'||5^') lU). 

Proof. This lemma is closely related to theorem 1.5 in Chatterjee (2005) but improves the bound. 
For X = (xi,...,x„) G M", let x{s) = Yli^i^n'^i(^)^i- : — )- R be given by 

F^(x)=r'log(^exp(/3x(s))) 
\se5 / 

for all x G M". Then 

m&xx{s) = log ^exp(/3 m3x.x{s))^ ^ log ^X!/ 6xp(/33;(s))^ 

^ /S^"*^ log I pexp(/3 maxx(s)) I ^ /3^^ logp + maxx(s), 
\ sg5 J seS 

and so 



maxx(s) — Fb{x)\ ^ (3 ^ logp. 

sG5 
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Therefore, 



|E[5(maxe(s)) -5(maxe(s))]| ^ 2\\g^^^\\^\ogp/ ^ + \E[g{Fp{e) - g{Fp{e) 



where e = (ei,...,e„) and e = (ei,...,e„). For j = l,n, let = (ei, , Cj+i, e^), and let 
= e. Then 

\E[g{Fp{e) - g{Ffs{e)]\ ^ \E[g{Fp{x^)) - g{F^ix^-'))]\. 

Let m : — )• M be given by m(x) = g{Fi3{x)) for all x S R", and let d^m denote the k-th. partial 
derivative of m with respect to the argument j. Then a Taylor expansion yields 

giFf^ix^)) - giFpix^'')) = dM^''')i^j - ^j) + d]m{x^''^){e] - e])/2 

for some n-vectors x^'^'^ and x^'^''^ where x^'^ = (ei, 0, e^+i, e„). Since ej and ej 
are jointly independent of x^'^ and E[ej] = E[ej] = 0, E[5jm(x-^'°)(ej — Cj)] = 0. In addition, 
E[5|m(x^'°)(e2 - e|)/2] = because E[ej] = E[e^] = a]. So, by assumption Adl 

\E[g{Fis{xn) - g{Ft,{x^-'m < sup |a|m(x)|. 
Finally, simple algebra shows that 

sup \d]m{x)\ < ^^(ll^^^'^lloo + /3||5^')||oo + /3'||5^')||oo). 
Combining presented inequalities gives the asserted claim. □ 
Lemma 8. Under Assumptions and uniformly over 5 C 5„ and r] G (0, 1), 
P(maxe(s) ^ c;?'°) = r] + o(l) and P(max(-e(s)) ^ d?'^) = r] + o(l). 

Proof. By Assumption AO nyl^(logp) ''^^ — t- 0. Therefore, I can choose a sequence {^n} of 
positive numbers such that ^„ — )• oo and ^^riA^ (log p)"^/^ — 0. Let 5 : M — [0, 1] be a function 
from the class C^(M, M) satisfying g{x) = 1 for x ^ and g{x) = for x ^ 1. Let gn{x) = 
g{in{\ogpY'\x - Finally, let /3„ = en{^ogpfl\ Then 

ll5i'^l|oologp//3„ < C„(logp)3/V/3„ ^ 0. 

In addition, 

(11^(3) + ^^11^(2) ^^2||^(i) 11^) < ^lnAl{\ogpf'^ ^ 0. 
Therefore, applying Lemma [7] yields 

E[5r„(maxe(s)) - 5'„(maxe(s))] 0. 
se5 sg5 



36 CHETVERIKOV 

Finally, Lemma [5] gives 

P(maxe(s) ^ c^'°) ^ E[c/„(maxe(s))] ^ E[5r„(maxe(s))] + o(l) 
s£S s£S ses 

^ P(maxe(s) ^ c;?'° + l/(e„(logp)^/2)) j_ ^(^^ ^ ^ ^ 

The upper bound follows similarly. Combining the lower and the upper bounds gives the first 
result. The second result follows similarly. Note that all convergence statements hold uniformly 
over S C Sn and r] £ (0,1). □ 

Lemma 9. Under Assumptions J^Jl ^40, ^^[^ and there exists a sequence {ipn} of positive 
numbers converging to zero such that uniformly over S C 5„ and r] € (0, 1), P(c^^^^^ < c^) = o(l) 
anrfP(cf+^^ <4'°) = o(l). 

Proof. Denote 

T*^ = maxe(s) = max ai{s)ai€i and T'^'^ = maxe(s) = max ai{s)aiei. 

s£S s<=S ^ — ' s<=S s£S ^ — ^ 

Note that is the conditional rj quantile of given {Bi} and c^''' is the unconditional rj quantile 
of T'^'^. In addition, denote 

pi = max |e(s)| max |1 — {V{s)/V{s))^^^\, 

sG5 sG5 

P2 = max| aj(s)(CTi - o-i)ei| max(y(s)/y(s))^/^. 

Then [T*^ — T^'^\ ^ +P2- Combining Lemma H] and Assumption AH] gives 

pi = Op{{logp)^/^n-'''). 

Consider p2. Conditional on {di}, [di — (yi)ei is distributed as a A^(0, (a^ — (Jj)^) random variable, 
and so applying the argument like that in Lemma H] conditional on {^j} and using Assumptions 
Ad] and AI3] gives 

max I ai{s){ai - ai)ei\ = Op((logp)^/^n"''2). 

sG5 ^ — ^ 

Since maxs£5(y(s)/y(s))^/^ — )-p 1 by assumption AHJ this implies that 

P2 = Op((logp)i/2n-«2). 

Therefore, T*^ — T^'^ = Op((logp)^/^n^'^2^'*^), and so there exists a sequence {ipn} of positive 
numbers converging to zero such that 

V{\T^ - T'5'0| > (logp)^/2^-'^2A,C3) ^ 

Hence, 

P(P(|r'5 _ r'5'0| > (iogp)i/2^-«2AK3|{^j) > ^ 0. 
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Let An denote the event that 

I win take ipn = ipn + C{logp)n~'^^^'^^ for a constant C that is larger than that in the statement 
of Lemma [5j By assumption A[5l Vn. ~^ 0. Then note that 

PlT'^.o ^ ^ r? and P(r'5 ^ c^\{ai}) ^ v 

for any 77 € (0, 1). So, on A^, 

< PiT^^c^J- +(logp)i/2„-'^2A«3|{5.})+^^ 

where the last line uses LemmaO Therefore, on An, ^ S+i/'n' ^'^^ ''^^S+'i/'n ^ "^^-^ ~ '^(1)- '^^^ 
second claim follows similarly. □ 

Lemma 10. Let c^'^ denote the conditional i] quantile ofT^'^ = max^g^ ^j^^^^^ ai(s)eiej given 
{si}- Let Assumptions and ^40 hold. Then there exists a sequence {V'n} of positive 

numbers converqinq to zero such that Pic^'^- < Cn'^) = o(l) and P(c^'^- < Cv''^) = o(l) 
uniformly over S C Sn and r] £ (0, 1). 

Proof. I will invoke the following result recently obtained by Chernozhukov, Chetverikov, and 
Kato (2012). 

Lemma 11. Let and Z'^ be zero-mean Gaussian p-vectors with covariances and cor- 
respondingly. Then for any g G C^(M,M), 



|E[5(max 4)-5(max Z^)]] ^ 115(2) ||^Ae/2 + 2||5« ||oo\/2As logp 
where As = maxi<^j^kiip \^]k ~ ^"jkl- 

Proof. See Theorem 1 in Chernozhukov, Chetverikov, and Kato (2012). □ 

Let = {El ^i^n '^«(^)^«^*}sG<s ^'^d Z'^ — {Z]i^j<„aj('S)<7iej}sg5. Conditional on these 
are zero- mean p-vectors with covariances and given by 



^liS2 = ai{si)ai{s^)ef and S^^^^ = ai{si)ai{s'^)a'^i 



Hysijaiyr^'^-'^ 

Let As = maxs^^sjes l^liS2 ~ ^siS2l' '^^^ following Lemma will be helpful. 



Lemma 12. (logp)2As = Op{l). 
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Proof. Let u = Un = n^/^^'^'t'^^ where (/>i is given in Assumption Al5j Let ii = ejl{|ej| ^ n}, and 
let df = E[e^]. It follows from assumption A[T]tliat P(maxi^j^„ \ei — ei\ = 0) — 1. In addition, 

uniformly over i = l,n, and so 



(logp)" 



ai{si)ai{s2){af - af] 



= (3) (logpY/u'^^ =(4) 0(1) 

where (1) is by Assumption AHl (2) is by Holder inequality, (3) is because Yli^i^n^ii^)'^^i ~ ^ 
by construction, and (4) is by Assumption A[5j Therefore, 



(logp)2As = (logp)2 max 



Note that |ai(si)ai(s2)(ei - ct/ )| ^ 2A„m . In addition. 



E 



uniformly over si,S2 G 5 since E[(e? — (Tj^)^] ^ E[e^] ^ E[e^] < 1 by Assumption A[TJ Hence, 
applying Bernstein inequality (see, for example, Lemma 2.2.9 in Van der Vaart and Wellner 
(1996)) gives for some C > 0, 



P (logp)2 Y <si)aiis2)iE] - af) 
for any t > 0, and so by the union bound. 



P I max (logp)^ 



> n ^ 2 exp 



C(logp)M2 + C{logp)HAlu'^ 



Y ai{si)ai{s2){ei - a'^i 



l<i<n 



> t 



^ 2 exp ( 2 log p 



C{\ogpYAl + C{\ogp)HAlv? 

The result follows because Assumption A[5] implies that logp = o{l/ {{iogp)^A^) and logp 
o(l/((logp)2A2n2)). 



□ 



It follows from Lemma[T2]that there exists a sequence {ipn\ of positive numbers converging to 
zero such that 

(logp)2As = Op(V;^). (11) 
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Let g G C2(M,M) be a function satisfying g{t) = 1 for t ^ 0, g{t) = for t ^ 1, and g{t) G [0, 1] 
for t G [0, 1], and let ,„(t) = gUt - <^^^/,)/(<;°,-^ - Z^/,))- Then 

lbl^^lU<i/(<i„-<4/,)<(iogP)^/V^., 

lbl^)||oo<l/(<;°^„-<4/,)^<(logp)/^^ 

Applying Lemma [TT] gives 

Dn = \E[gn{maxZl) - 5„(maxZ2)|{e„}]| < {logp)A^/^Pl + {logp){Aj:y/^ /^Pn = Op(V^n) (12) 
se<s se<s 

by equation (jlip . Note that max^g^ Zj = T^'^ and, using the notation of the proof of Lemma 
El max,g5 = T-^'O. Then 

p(r^'i ^ c^;°^J{£.}) ^(1) Eb„(r^'i)|{e.}] ^(2) Eb„(r^'°)|{ej] - 

^(3) P(T'5'0 ^ c;^:f^J{e.}) - =(4) PiT^''' ^ C^;°^^/2) - ^ + - Da (13) 

where (1) and (3) are by construnction of the function g^, (2) is by equation (fT2|) . and (4) is 
because T^'^ and c^^^ ^2 ^'^^ jo™tly independent of {ej}. Finally, note that the right hand side 



of line ()13p is bounded from below by rj w.p.a.l. This implies that P(c ' - < c,;' ) = o(l), which 



5,0 

is the first asserted claim. The second claim of the Lemma follows similarly. □ 
Lemma 13. Under Assumptions and there exists exists a sequence {ipn} of 

S 

positive numbers converging to zero such that uniformly over S <Z Sn and rj G (0, 1), P{c^^^^ < 
<) = o(l) andP«+^„<4'°) = o(l)§ 

Proof. Lemma [10] established that 

Therefore, it suffices to show that 

P{c^^T < cf'^) = 0(1) and P(c'^'V < 4) = o{l). 
for some sequence {"i/^n} of positive numbers converging to zero. Denote 

pi = max I aj(s)eiej| max |1 — (y(s)/F(s))"'"/^|, 

P2 = max| aj(s)(aj - ej)ej| max(F(s)/F(s))^/^. 

Note that jT"^ — T^'^\ ^ pi + P2 and that by Lemmas H] and [lOl max^g^ | "^^is^is^n = 
Op((logp)^/^). Therefore, the result follows by the argument similar to that used in the proof of 
Lemma [9] since di — ei = Op{n''^^) by assumption A[2l □ 



^Note that Lemmas and [T51 provide the same results under two different methods for estimating en 
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Lemma 14. Let Assumptions and hold. In addition, let either Assumption or 

M hold. Then P(5^ C 5^^) ^ 1 - 7„ + o(l) and P(5^ C S^S) ^ i _ ^„ + o(l). 

Proof. Suppose that S^\S^^ ^ 0. Then there exists the smallest integer I such that 7^ 0, 

and so 5^ C S^^^ (if / = 1, let 5^ = Therefore, cf__^^ ^ '^i-7„- follows that there exists 
an element s of such that 

fis) + sis) ^ -cf_V - c[-X ^ -cf_V - cf_,„, 

and so 

P(5„«\5f ^ / 0) ^ P(min (/(s) + < -cf.^ - cf_,J 

^(1) P((min(/(s) + e(s))V ^ -cfi^„ - cf^J 



/ 13// ■ / / \ ^-f.O / P/,0 ij,0 N , /IN 



=W ^ ^-C-^JVV - 1) + c-L\.^jV) + 0(1) 

^(5) P((max(-e(s)) ^ cflO^^^^/V - C7(logp)i/2n-''3/(^^ + ^^)) + ^(i) 

^(■6~i P((max(— e(s)) ^ cf '° , x , , s) + o(l) 

^(7) ln + 1pn + C(logp)n"''V(7n + V'n) + o(l) =(§) 7n + o{l) 

where (1) follows from the definitions of f{s) and e(s), (2) is by the definition of tpn, (3) is by the 
definition of 5^, (4) is rearrangement, (5) is by Lemma U] and Assumption A^l (6) is by Lemma 
[5l (7) is by Lemma [HI and (8) follows from the definition of ipn again. The first asserted claim 
follows. The second claim follows from the fact that C 5^'^. □ 

Lemma 15. Let Assumptions and hold. In addition, let either Assumption or 

41 hold. Then P(max,g5^^\5_R(/(s) + e{s)) ^ 0) ^ 1 - 7„ + o(l). 

Proof. The result follows from 

P( max (/(s)+e(s))^0)=P( max (/(5)+£(5))^0)^(i)P( max 6(s) ^ cfi:^° 

S^Sn\S^ S^Sn\S^ S^Sn\Sj^- 

^ P(maxe(s) ^ cfi'° . ) =(2) 1 - 7„ - ^„ + o(l) =(3) 1 - 7n + o(l) 
where (1) follows from the definition of 5^, (2) is by Lemma [HI and (3) is by the definition of 
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C.2. Proofs of Theorems. 

Proof of Theorem [Jl Note that 

P(r ^ cf_ J = P(max(/(.) + s{s)) ^ cf„ J ^(i) P(max(/(s) + e(s)) ^ cf_ J - 7n + o{l) 
^(2) P(max(/(s) + e(s)) ^ cf_J - 27, + o(l) ^(3) P(maxe(s) ^ cf_„) - 27„ + o(l) 



^(4) P(maxe(s)V cfl°„_^J - 27„ + o(l) =(5) P(max6(s) ^ cfl°„_^^/V) - 27^ + o(l) 



^(6) P(max£(5) ^ c^^'L_^„(l - n-'^^)) - 27„ + 

^(-7-1 P(max ^ cf ''^ , x , , J — 27„ + o(l) 

sG5^ l-o--0„-C{Iogp)n '=3/{o+i/)„)'' ' w 

=(8) 1 - a - - C(logp)n-''V(a + V'n) - 27n + 0(1) =(9) 1 - a + o(l) 

where (1) fohows from Lemma \TE\ (2) is by Lemma [Ml (3) is because mider Ho f{s) ^ 0, (4) 
fohows from the definitions of and ipn, (5) is rearrangement, (6) is by Assumption AlH (7) 
is by Lemma m (8) is by Lemma El and (9) is by the definitions of V'n and 7„. The first asserted 
claim fohows. 

In addition, when / is identicahy constant, 
P(r ^ cf_J =(1) P(maxe(s) ^ cf_„) ^(2) P(maxe(s) < cfi„) + 7n + 
^(3) P(maxe(s) < cfif+^J + 7n + o(l) ^(4) P(maxe(s) < cfif+^Jl + n''^^)) + 7n + 
^(5) P(maxe(s) < cfi^^^^^^^^^j^g^^^.^g/^^.^^P + 7n + o(l) ^(6) 1 " " + 

where (1) follows from the fact that /(s) = whenever / is identically constant, (2) follows from 
Lemma [m (3) is by the definition of (4) is by Assumption AjU (5) is by Lemma [6l and (6) 
is from Lemma [8] and the definitions of 7„ and ipn- The second asserted claim follows. □ 

Proof of Theorem\^ Suppose that /(X2) < f{xi) for some xi,X2 S [si, Sr] satisfying X2 > xi. By 
the mean value theorem, there exists xq G (xi,X2) satisfying 

f'{xo){x2 - xi) = f{x2) - fixi) < 0. 

Therefore, /'(xq) < 0. Since /'(•) is continuous, f'{x) < /'(xo)/2 for any x £ [xq — A^,xo + Aa-] 
for some A^, > 0. Take s = s„ G 5„ as in Assumption A[7] applied to the interval [xq — A^,., xq+Aj.]. 
By Assumptions A[T]and A17|-(ii), V{s) ^ Cn^. In addition, combining Assumptions AEl A17|-(i) 
and A[7|-(iii) gives 

ifiXi) - f{Xj))sign{Xj - XMX„Xj,s) ^ Cn^ (14) 
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for some C > 0. Further, since Ylif^it^n ^i(^)'^ ~ Assumption A[T] implies An ^ Cj-n}!'^ for 
some C > 0, and so Assumption A[5] gives logp = o(n). Therefore, 

p(T^cf_j ^(1) P(T^cfiJ^(2)P(r^cfi;,%J + o(i) 

^(3) P(T ^ C(logp)l/2) + o(l) ^(4) V{J{S) + ^ C(logp)l/2) + 

^(5) P(/(s) + e{s) ^ C(logp)i/2(l + n---^)) + o(l) 

^(6) P(/(s) + e{s) ^ 2C(logp)i/2) ^ 

^(7) P(e(s) ^ 2C(logp)i/2 _ + 0(1) ^(8) V{e{s) ^ -Cn^'^) + o(l) 

^(9) P(max(-e(s)) ^ Cn^/^) + o(l) 

sG<S„ 

^(10) P(max(-£(5)) ^ cfi^(i„gp/„)i/.) + 0(1) ^(ii) C(logp/n)i/2 + o(l) = o(l) 

where (1) follows from 5^ C S^^ , (2) is by the definition of ipn, (3) is by LemmaHl (4) is since 
T = maxsg5„(/(s) + e(s)), (5) is by Assumption AIH (6) is obvious, (7) is by equation (fH|) and 
that V{s) ^ Cn'^, (8) follows from logp = o(n), (9) is obvious, (10) is by Lemma[4]and Markov 
inequality, and (11) follows by Lemma [8l The result follows. □ 

Proof of Theorem [21 The proof follows from an argument similar to that used in the proof of 
Theorem [2] with equation replaced by 

{f{X^) - /(X,))sign(X, - Xi)Q{Xi,Xj,s) ^ ClnU^ 

and condition logp = o(n) replaced by logp = o{l'^n). □ 

Proof of Theorem^ Since inf^gj^^ ,5^] f^^\x) < —ln{logp/n)^^^'^^^^\ for sufficiently large n, there 
exists an interval [xn,i,Xn^2] C [si,Sr.] such that \xn^2 — Xn,i\ = C^hn and for all x G [xn,i,Xn,2], 
f^^\x) < — /„(logp/n)'^/(^'^+'^)/2. Take s = s„ G 5„ as in Assumption AOapplied to the interval 
[xn,i,Xn,2] By Assumptions A[T1 AlHl and Al9]-(ii), V{s) ^ C{nh)^h^. In addition, combining 
Assumptions AEl Al9l-(i), and AID- (in), 

ifiXi) - f{Xj))sign{Xj - XMX„X„s) ^ lnChl+^+\nhf 

for some C > 0, and so /(s) ^ Inh]^^ {nhY^'^ . From this point, since logp = o{lnh^^^n), the 
argument like that used in the proof of Theorem [2] yields the result. □ 

Proof of Theorem\^ Consider any sequence {Xi} satisfying Assumption AlHJ Let h = = 
Co(log n/n)-'^/^^^^'^) for sufficiently small Co > 0. Let L = [{sr — s/)/(4/i)] where [x] is the largest 
integer smaller or equal than x. For I = 1,L, let xi = Ah{l — 1) and define /; : — )• M by 
fi{si) = 0, fl^\x) = Oifx ^ xu f!'\x) = -L{x-xif \ix e {xuxi+h], f^^\x) = -L{xi+2h-xf 
if X G {xi + h,xi + 2h] , //^^ (x) = L{x-xi- 2hf if x G {xi + 2/i, + 3/i] , //^^ (x) = L{xi+4:h-xf 
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if X G {xi + 3h,xi + Ah] and f^^\x) = otherwise. In addition, let /o(x) = for all x G [s;,Sr]. 
Finally, let {e} be a sequence of independent A^(0, 1) random variables. 



For / = 0,L, consider a model Mi = M^^i with the sequence of design points {Xi}, the 
regression function and the noise {£«}. Note that Mq belongs to M. and satisfies Hq. In 
addition, for / ^ 1, M/ belongs to does not satisfy Ho, and, moreover, has inf^-gj^^ 3^,] fl^\x) < 
-(7(logn/n)/'/(2/3+3). 

Consider any test -0 = "0(^1, •••) ^n.) such that Ejv/oiV'] ^ a + o(l). Then following the argument 
from Dumbgen and Spokoiny (2001) gives 

inf ^mW\ - a ^ min EAfJV] " ^mM + ^ Y] ^Nh[^]/L - EmqM + o(l) 
= EA/oM/i-EAfoM+o(l)= ^ EMo[V'(pi-l)]/^ + o(l) 

^EA./oiV'l 5; p,/L-1|]+o(1)^Ea/o[| Pl/L-l\]+o{l) 
where pi is the likelihood ratio of observing {l^ji^j^n under the models Mi and Mq. Further, 
p; = exp( YifiiXi)- Y fi(^^f/A = eM^n4n,i - '^li^ 

where ujn,i = /K^i)^)^^^ and ^n,/ = I]i^i^n ^j//(-'^j)/^n,/- Note that under the model 

Mq, {S,n,i}i(iKL is a sequence of independent A^(0, 1) random variables. In addition, by the 
construction of the functions // and since Assumption E] holds, ujn,i ^ Cn^/^h^+^/^ = C(logn)i/2 
where C can be made arbitrarily small by selecting sufficiently small Cq. Therefore, 



emo[| Y p^i^ - i|] ^ (Em„[( Y p^i^ - 1)'])'^' ^ ( E ^mApI/L^])" 

IsiKL IsiKL 



1/2 

■^jwo L/^i / ^ J ; 

IsiKL l^isSL 

^ (exp((72 log n - log L))^/^ ^ exp((C72 ^ _ /2) = o(l) 



because C is arbitrarily small and logn < logL. Therefore, mfM^M2 Ea/['0] ^ « + o{l), and so 
the result follows. □ 



Appendix D. Proofs for Section [5] 

Proof of LemmaUl Let X be a random variable distributed according to the law P^. Then {Xi} 
is an i.i.d. sample from the distribution of X. Let li = l{Xi £ [xi,X2]} for [xi,X2] C [si,Sr]- 
Then E[/j] = p = Px{[xi, X2]) > 0. By Hoeffding inequality (see, for example. Appendix B in 
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Pollard (1984)), 

P( ^ < pn/2) = P( X] - < -pn/2) ^ exp( VnV(8n)) = exp(V?^/8). 



Since X^isgn^oo ^^P(~?'^/8) < °o, the first asserted claim follows by the Borel-Cantelli Lemma. 

To prove the second claim, let Un = [1/ {C^n^'^/'^)\ + 1 where [•] denotes the largest integer that 
is smaller or equal than the quantity inside the brackets. Let si = Xn,o < Xn,i < ... < Xn,Un = ■^r- 
where Xn,u — Xn,u-i = {sr — si)/Un = hno- It clearly suffices to show that for almost all realizations 
{Xi} there exists an integer such that for any n ^ N, 

C^nhno = l,n : Xi £ x„, J}| ^ Cenhno 



for all « = l,Un- Let Pn^u = Pxi[xn,u-i, Xn,u]) ■ Then by Assumption, there exist constants C_ and 

C such that ChnO ^ Pn,u < ChnO- Let Ii^n,u = HXi G [Xn,u-l,Xn^u]}- Then E[Ii^n,u] = E[/^„_J = 

Pn,u, and so Bernstein inequality gives 

P( Ii,n,u > 2Cnhno) ^ P( X] ~ ^ihn,u]) > CnKo) 

^ exp(— C^n^/i^g/(2Cn/i„o + 4Cn/i„o/3)) ^ exp(— Cn/i„o) 
for some C > 0. Then by the union bound, 

P( max ^ Ii^n,u - 2Cnhno ^ 0) ^ ^ P( X] ^ 2Cn/i„o) 

^ exp(C(log(l//i„o) - n/ino)) ^ exp(-Cn^/^). 

Since X^isgn^oo ^^P(~^^^^^) ^ Borel-Cantelli Lemma implies that for almost all realizations 
{Xi} there exists such that for any n'^ N, \{i = l,n : Xi £ [xn,u~i,Xn,u]}\ ^ C^nhno for all 
ti = 1, f/„ as long as Cg > 2C. The lower bound follows similarly. Combining these bounds gives 
the second asserted claim. □ 

Proof of LemmalM For B > 0, let Un,B = Bv}^^^^'^\ In addition, define An^B as the event 



that {maxi^i^n ^ Un^s}- Note that P(yln,s) — 1 as — >• oo uniformly over n = l,cx) by 
Assumption A[TJ Further, 

E[|a2 - ^(1) E[|af - a,2|]/P(A„,B) ^(2) (E[(af - a,2)2])i/VP(^n,B) 

^(3) (e[( {Y^^^-Y,f/{2\jm-<ylf]\ /nAn,B) 

<(4) (1/1 J(i)|^/' + 6n)/P(^n,B) <(5) (l/(n6n)'/' + &n)/P(^n,B) <(6) &n/P(^n,B) 

where (1) follows from the definition of conditional expectation, (2) is by Jensen inequality, (3) 
is by the definition of the local version of Rice's estimator, (4) is by Assumptions (iv) and (v), 
(5) follows from Assumption (iii), and (6) is from Assumption (ii). In addition, exponential 
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concentration inequality for functions with bounded differences (see, for example, Theorem 12 in 
Boucheron, Bousquet, and Lugosi (2004)) gives for any t > 0, 

PiWaf - af\ - E[\af - af\\An,B]\ > t\An,B) < 2exp(-C7| J(i)|tV<B) 
for some C > 0, and so using the fact that | J(i)| > Cnbn, the union bound with t = bn yields 
P(max |af - af\ > C6„(l + 1/P(A„,b))| A.s) < exp(logn - n'^/^^+'^^ft^) ^ ^(^^ 

for any given B > where the last equality follows by Assumption (ii). Therefore, maxi<gj<g„ {o'f — 
af \ = Op{bn). Finally, since fij is bounded from above and away from zero uniformly over i by 
Assumption A[T1 it follows that maxi^j^„ \ai — ai\ = Op{bn), which is the asserted claim. □ 

Proof of Lemma\^ Let s = {x,h) G 5„. Since h ^ {sr — si)/2, I have either si + h ^ x oi 
X + h ^ Sr- I will consider the former case. The result for the latter case follows from the same 
argument. Let Ci G (0, 1). Since the kernel K is continuous and strictly positive on its support, 
miujgjQ (5^] K{t) > 0. In addition, since K is bounded, I can find a constant C2 € (0, 1) such that 

2Ce{l - (72)^+^ max K{t) ^ CsCfCi min K{t) (15) 

te[-i-C2] telo,Ci] 

where the constant k appears in the definition of kernel weighting functions. 
Then for Xi e [x - {1 + C2)h/2, x - Cih], 

^ sign(X,- - Xi)\X, - X,\'K{{Xj - x)/h) 

^(1) iC2h)'KiiX,-x)/h)- Yl {{l-C2)hfK{{X^-x)/h) 

>(2) {C2hfC^Cinh min K{t) - ((1 - C2)hfCQ{l - C2)nh max K{t) 

te[o,Ci] te[-i-C2] 

^f3) iC2hfC^Cinh min Kit) 12 -^u. Cnh^~^^ 
te[o,Ci] 

for some C > that depends only on {Cj : j = 3, 8}, Ci, C2, and the kernel K where (1) follows 
from the fact that Xi ^ x — C2h, (2) is by Assumption Al8l (3) is by equation p^ . and (4) is 
because min^g^ K{t) > 0. Then for Mn{x, h) = {i = l,n : Xi e [x — (1 + C2)h/2, x — C2h]}, 

2 

V{s) = 5^ Y ^'Sn(.Xj-Xi)Q{Xi,Xj,, 

Y (^fK{{X, - x)/hf I - - ^i\^K{{X^ - x)/h) 

^ Y a2ir((X,-x)//i)M ^ sign(X, -Xi)|X, -X,|^-i^((X, -x)//i) , 
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and so V{s) ^ C{nh)^h?'^ by Assumptions A[T]and AIH] where C > does not depend on (x, h). 
Therefore, claim (a) follows since 



sign{X, -Xi)Q{Xi,Xj,s) 



^ Cnh 



k+l 



Further, under Assumption AO 

\V{s)-V{s)\ 

< E \^f-<^f\KiiX,-x)/hfi ^ signiX,-X,)\X,-X,\''Ki{X,-x)/h) 



l<i<n 



^ max \af - a}\ K{{Xi - x)/hf [ ^ sign(X, - X^{)\X, - X,\^K{{X, - x)/h) j , 

and so \V{s) — V{s)\ ^ C{nh)^h^^Op{n~'^^). Combining this bound with the lower bound for 
V{s) established above shows that under Assumption Al3l \V{s)/V{s) — 1| = Op{n^'^^), and so 

\{V{s)/V{s)fl'-l\ = Op{n-^-) 

uniformly over 5„, which is the asserted claim (b). 
To prove the last claim, note that 



where 



his) 



his) 



\Vis)-Vis)\^Iiis)+l2is) 



E(^'-^') E signiX,-XMX,,X,,s) 



ls£js£n 



Consider As in the proof of Lemma [T^ let u = Un = n^/('^+'^^) where (f)i G (— 2,(/>). Let 

Ei = eil{\ei\ ^ u} and af = E[e?]. It follows from Assumption A[T]that P(maxi^j^„ \ei — ei\ = 
0) 1, and ^ erf - < l/u^^* uniformly over i = T~fi. Then his) < /ii(s) + inh)^ h"^^ / u'^+'^ 
w.p.a.l where 



hiis) 
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Applying Bernstein inequality and using the union bound yields 



and so 



P(max/ii(s)/y(s) > t) ^ 2exp(logp - C(n/imm)iV(l + u^t)), 

s£S„ 



P(max/ii(s)/y(s) > Cn-^^) 

S&Sn 



for any C > as long as conditions of the Lemma hold. 
Consider /2(s). Clearly, 

his) ^ Yl ((^^ - ^i)^ + - ^^\) I sign(^i - X,i)Q{X^,Xj,s) 

^ Op{n-^') J] + I sign(X, -X,)Q(X„X„s) I , 

and so l2{s)/V{s) = Op(l) uniformly over s G 5„ by arguments similar to those used above. 
Combining presented results gives the asserted claim (c). □ 

Appendix E. Proofs for Section [6] 

Proof of TheoremlE Denote = f{X,) + ei. Then Yi = Y^ + Zf /3 and Yi = Y^ - Zf (/3 - 

Therefore, \Yi — Y^ \ ^ — = Op{l/^/n) uniformly over i = l,...,n and all models in 
MpL- So, 

T = max ai{s)Yi = max ai{s)Y^ + o„(l/ \/\ogp) 



Since 



s£Sn S&Sn 



max V |ai(s)(yi-yi°)| =max V \ai{s){Yi - Y^)\Op{l) 



= max V |ai(s)|Op(l/V^)Op(l) = o{^/n/ log p)Op{l/^/^)Op{l) = Op{l/^/i^). 
The result follows by the argument similar to that used in the proof of Theorem [H □ 



Proof of Theorem The proof relies on the same notation as introduced in Section [B] of the 
Appendix with f(x,z), Q{xi, zi,X2, Z2,s), s, Sn, and substituting /(x), Q{xi, X2, s), s, Sn, and 
P- 

For S C Sn and rj G (0,1), let c^'^ be the conditional rj quantile of maxgg^ ^-|^^-^^ ai(s)ejej 
given {sj}. Since Anilogpy^'^ — )• 0, applying Corollary 6 (SC-d) of Chernozhukov, Chetverikov, 
and Kato (2012) shows that 

P(maxe(s) ^ c^'^) = r? + o(l) 
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uniformly over all 5 C 5„ and r] G (0,1), and so maxg^^^ kC*)! = Op{\^\ogp). In addition, the 
result of Lemma [TOl holds under the conditions (i)-(iv) of Assumption J^12\ and so 

P(maxe(s) ^ •°) = rj + o(l) 

uniformly over all S (Z Sn and r] G (0, 1), which gives the result analogous to that in Lemma [8j 
Further, 

r = max y2 ai{s)Yi= max_ ai{s){f{Xi, z) + + Op(l/v^logp) 

by conditions (v) and (vi) of Assumption A12i Therefore, the result follows by the argument 
similar to that used in the proof of Theorem [TJ 

□ 
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Supplementary Appendix 

This supplementary Appendix contains additional simulation results. In particular, I consider 
the test developed in this paper with weighting functions of the form given in equation ([2]) with 
k = \. The simulation design is the same as in Section [71 The results are presented in table 2. 
For ease of comparison, I also repeat the results for the tests of GSV, GHJK, and HH in this 
table. Overall, the simulation results in table 2 are similar to those in table 1, which confirms 
the robustness of the findings in this paper. 
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Proportion of Rejections for 



1^ L^lDC 






GSV 


GHJK 


HH 


CS-PI 


CS-OS 


CS-SD 


IS-PI 


IS-OS 


IS-SD 






100 


.118 


.078 


.123 


.129 


.129 


.129 


.166 


.166 


.166 


normal 


1 


200 


.091 


.051 


.108 


.120 


.120 


.120 


.144 


.144 


.144 






500 


.086 


.078 


.105 


.121 


.121 


.121 


.134 


.134 


.134 






100 





.001 





.002 


.009 


.009 


.006 


.024 


.024 


normal 


2 


200 





.002 





.001 


.012 


.012 


.007 


.016 


.016 






500 





.001 





.002 


.005 


.005 


.005 


.016 


.016 






100 





.148 


.033 


.238 


.423 


.432 











normal 


3 


200 


.010 


.284 


.169 


.639 


.846 


.851 


.274 


.615 


.626 






500 


.841 


.654 


.947 


.977 


.995 


.996 


.966 


.994 


.994 






100 


.037 


.084 


.135 


.159 


.228 


.231 


.020 


.040 


.040 


normal 


4 


200 


.254 


.133 


.347 


.384 


.513 


.515 


.372 


.507 


.514 






500 


.810 


.290 


.789 


.785 


.833 


.833 


.782 


.835 


.836 






100 


.109 


.079 


.121 


.120 


.120 


.120 


.200 


.200 


.200 


uniform 


1 


200 


.097 


.063 


.109 


.111 


.111 


.111 


.154 


.154 


.154 






500 


.077 


.084 


.107 


.102 


.102 


.102 


.125 


.125 


.125 






100 


.001 


.001 








.006 


.006 


.015 


.031 


.031 


uniform 


2 


200 











.001 


.009 


.009 


.013 


.021 


.024 






500 





.003 





.003 


.012 


.012 


.011 


.021 


.021 






100 





.151 


.038 


.225 


.423 


.433 











uniform 


3 


200 


.009 


.233 


.140 


.606 


.802 


.823 


.261 


.575 


.590 






500 


.811 


.582 


.947 


.976 


.993 


.994 


.971 


.990 


.991 






100 


.034 


.084 


.137 


.150 


.216 


.219 


.020 


.046 


.046 


uniform 


4 


200 


.197 


.116 


.326 


.355 


.483 


.488 


.328 


.466 


.472 






500 


.<S()3 


.2().") 


.789 


.803 


.832 


.85.5 


.790 


.859 


.801 



Nominal Size is 0.1. GSV, GHJK, and HH stand for the tests of Ghosal, Sen, and van der Vaart (2000), 
Gijbels, Hall, Jones, and Koch (2000), and Hall and Heckman (2000) respectively. CS-PI, CS-OS, and 
CS-SD refer to the test developed in this paper with cTj estimated using Rice's formula and plug-in, 
one-step, and stepdown critical values respectively. Finally, IS-Pl, IS-OS, and IS-SD refer to the test 
developed in this paper with cTj estimated by di = Si and plug-in, one-step, and stepdown critical values 

respectively. 



